Introduction to Cancer Genomics Databases

1,873 views
1,614 views

Published on

Presentation at the Canadian Cancer Research Conference satellite bioinformatics.ca workshop. This one is an introduction to tcga, icgc and cosmic databases.

Published in: Education, Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,873
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
103
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide
  • {"33":"Ensembl 61 Hs has 53,515 gene loci annotated, which explain high affected genes numbers for SSMs (I’ve double-checked these numbers)\n","29":"A few notes on ICGC\n","19":"Consequtive basepairs\n","59":"Summary page with basic gene description and list of curated pubs. Click on Histogram to view the distribution of mutations. \n"}
  • Introduction to Cancer Genomics Databases

    1. 1. Canadian Cancer Research Conference November 3-6, 2013 Canadian Bioinformatics Workshops www.bioinformatics.ca
    2. 2. Module #: Title of Module 2
    3. 3. You are free to: Copy, share, adapt, or re-mix; Photograph, film, or broadcast; Blog, live-blog, or post video of; This presentation. Provided that: You attribute the work to its author and respect the rights and licenses associated with its components. Slide Concept by Cameron Neylon, who has waived all copyright and related or neighbouring rights. This slide only ccZero. Social Media Icons adapted with permission from originals by Christopher Ross. Original images are available under GPL at; http://www.thisismyurl.com/free-downloads/15-free-speech-bubble-icons-for-popular-websites Module 1: Cancer Genomic Databases bioinformatics.ca
    4. 4. Module 1 Cancer Genomic Databases
    5. 5. E-mail E-mail francis@oicr.on.ca @bffo Module 1: Cancer Genomic Databases bioinformatics.ca
    6. 6. Schedule for Module 1 Cancer Genomic Databases •The Databases: – The International Cancer Genome Consortium (ICGC) – The Cancer Genome Atlas (TCGA) – The Catalogue of Somatic Mutations in Cancer (COSMIC) •Data Access: human genomes and security and privacy issues, Open vs. Controlled Access data Module 1: Cancer Genomic Databases bioinformatics.ca
    7. 7. Module 1: Cancer Genomic Databases bioinformatics.ca
    8. 8. http://bioinformatics.ca/ Module 1: Cancer Genomic Databases bioinformatics.ca
    9. 9. Module 1: Cancer Genomic Databases bioinformatics.ca
    10. 10. Workshops planned for 2014: http://bioinformatics.ca/workshops 1. 2. 3. 4. 5. 6. 7. 8. Exploratory Analysis of Biological Data using R Bioinformatics for Cancer Genomics Informatics for RNA-sequence Analysis Informatics on High Throughput Sequencing Data Pathway and Network Analysis of -omics Data Flow Cytometry Data Analysis using R Microarray Data Analysis Informatics and Statistics for Metabolomics Module 1: Cancer Genomic Databases bioinformatics.ca
    11. 11. http://bioinformatics.ca/workshops/2013 Module 1: Cancer Genomic Databases bioinformatics.ca
    12. 12. E-mail: course_info@bioinformatics.ca Web: http://bioinformatics.ca Workshop announcement mailing list: http://bioinformatics.ca/mailman/listinfo/announce Module 1: Cancer Genomic Databases bioinformatics.ca
    13. 13. Soap-Box time! • • Open Access, Open Data and Open Source are essential for good Science. Openness is a responsibility, an obligation, and something that comes with the privilege of doing publicly funded work. Open Source Open Access Open Data Opencourseware Module 1: Cancer Genomic Databases bioinformatics.ca
    14. 14. Module 1: Cancer Genomic Databases bioinformatics.ca
    15. 15. Cancer therapy is like beating the dog with a stick to get rid of his fleas. - Anna Deavere Smith, Let me down easy Module 1: Cancer Genomic Databases bioinformatics.ca
    16. 16. http://goo.gl/Yhbsj Module 1: Cancer Genomic Databases bioinformatics.ca
    17. 17. The revolution in cancer research can summed up in a single sentence: cancer is in essence, a genetic disease. - Bert Vogelstein Module 1: Cancer Genomic Databases bioinformatics.ca
    18. 18. Cancer: a Disease of the Genome Challenge in Treating Cancer:  Every tumour is different  Every cancer patient is different Module 1: Cancer Genomic Databases bioinformatics.ca
    19. 19. Cancer Genomic Databases Chin et al, Genes. Dev. 2011 March 15; 25(6): 534-555 http://www.ncbi.nlm.nih.gov/pubmed/?term=21406553 Module 1: Cancer Genomic Databases bioinformatics.ca
    20. 20. TCGA The Cancer Genome Atlas is a comprehensive and coordinated effort to accelerate our understanding of the molecular basis of cancer through the application of genome analysis technologies, including largescale genome sequencing. Module 1: Cancer Genomic Databases bioinformatics.ca
    21. 21. About the TCGA • • • • National Cancer Institute (NCI) National Human Genome Research Institute (NHGRI) Phased Structure: – Three-year pilot in 2006 with an investment of $50 million from each – TCGA will collect and characterize more than 20 additional tumour types (now at 16) Module 1: Cancer Genomic Databases bioinformatics.ca
    22. 22. Where to start with the TCGA? Wiki: https://wiki.nci.nih.gov/display/TCGA/About+TCGA Module 1: Cancer Genomic Databases bioinformatics.ca
    23. 23. Division of Labour • Biospecimen Core Resource (BCR) – centre where samples are carefully catalogued, processed, qualitychecked and stored along with participant clinical information • Genome Sequencing Centre (GSC) – uses high-throughput methods to identify changes to DNA sequences that are associated with specific cancer types • Genome Characterization Centre (GCC) – uses high-throughput technologies to analyze genomic changes involved in cancer • Genome Data Analysis Centre (GDAC) – provides novel informatics tools to the research community • – provides analysis results using TCGA data. Data Coordinating Centre (DCC) – Central provider of TCGA data. – Standardizes data formats and validates submitted data. Module 1: Cancer Genomic Databases bioinformatics.ca
    24. 24. TCGA Data • Sequence reads from newer sequencing technologies are available at the Cancer Genome Hub: https://cghub.ucsc.edu/ • Higher level sequence data (variation calls and abundance measures) are available at the TCGA Portal: http://cancergenome.nih.gov/ Module 1: Cancer Genomic Databases bioinformatics.ca
    25. 25. TCGA data flow http://goo.gl/b5nojx Module 1: Cancer Genomic Databases bioinformatics.ca
    26. 26. Data Coordinating Centre • Play a central role – Receiving data from BCR, GSC and GCC sites – Providing access to users – Performing analysis of data • Responsibilities: – – – – Protecting participant privacy and confidentiality Developing data standards and controlled vocabularies Establishing informatics pipelines for data flow Developing new analytical and visualization technologies to facilitate data analysis, for all audiences Module 1: Cancer Genomic Databases bioinformatics.ca
    27. 27. TCGA DCC Data Portal • Provides a platform to search, download and analyze TCGA data sets • Two data access tiers: Open and Controlled • Analytic tools include: Cancer Molecular Analysis and Cancer Genome Workbench (NCBIB), Integrative Genomics Viewer (Broad) and CancerGenomics Analysis (MSKCC). Module 1: Cancer Genomic Databases bioinformatics.ca
    28. 28. TCGA Data Browser https://tcga-data.nci.nih.gov/tcga/ Query TCGA data online using the TCGA Data Browser Module 1: Cancer Genomic Databases bioinformatics.ca
    29. 29. The International Cancer Genome Consortium (ICGC) • http://www.icgc.org/ • “ICGC was launched to coordinate largescale cancer genome studies in tumours from 50 different cancer types and/or subtypes that are of clinical and societal importance across the globe” Module 1: Cancer Genomic Databases bioinformatics.ca
    30. 30. ICGC BAM/FASTQ ICGC Open Data (includes TCGA Open Data) COSMIC Open Data TCGA BAM/FASTQ
    31. 31. ICGC Map – November 2013 67 projects launched Module 1: Cancer Genomic Databases bioinformatics.ca
    32. 32. Hardeep Nahal ICGC datasets to date ICGC Data Portal Cumulative Donor Count for Member Projects 10,000 Release 14 Release 11 Release 13 9000 Release 12 8000 Release 10 Release 9 7000 6000 Number of Donors 5000 Release 8 4000 Release 7 3000 2000 1000 Dec-11 Jan-2012 Feb March April May June July Aug Sept Oct Nov Module 1: Cancer Genomic Databases Dec Jan-2013 Feb March April May June July Aug Sept-2013 bioinformatics.ca
    33. 33. ICGC dataset version 14 September 2013 Hardeep Nahal • Cancer types: 41 • Donors: 8,532 (18,056 specimens) • Simple somatic mutations: 1,995,134 • Copy number mutations: 18,526,593 • Structural rearrangements: 18,614 • Genes affected* by simple somatic mutations: 22,074 • Genes affected* by non-synonymous coding mutations: 19,150 Genes affected* by copy number mutations: 20,341 • Genes affected* by structural rearrangements: 1,884 • *out 22,259 protein coding genes annotated in Ensembl Human release 69 • Open tier and controlled data currently available
    34. 34. Module 1: Cancer Genomic Databases bioinformatics.ca
    35. 35. Module 1: Cancer Genomic Databases bioinformatics.ca
    36. 36. Select “Pancreatic cancer – Canada” Module 1: Cancer Genomic Databases bioinformatics.ca
    37. 37. … But where is the data? Module 1: Cancer Genomic Databases bioinformatics.ca
    38. 38. Module 1: Cancer Genomic Databases bioinformatics.ca
    39. 39. http://dcc.icgc.org/ Module 1: Cancer Genomic Databases bioinformatics.ca
    40. 40. Module 1: Cancer Genomic Databases bioinformatics.ca
    41. 41. Module 1: Cancer Genomic Databases bioinformatics.ca
    42. 42. Can do bulk download of the data … Module 1: Cancer Genomic Databases bioinformatics.ca
    43. 43. ERA ERA TCGA TCGA DACO DACO ICGC ICGC dbGaP dbGaP EGA EGA BA BA BA BA MM M M BA BA BA BA MM M M + EGA id Module 1: Cancer Genomic Databases bioinformatics.ca
    44. 44. Module 1: Cancer Genomic Databases bioinformatics.ca
    45. 45. http://icgc.org/daco Module 1: Cancer Genomic Databases bioinformatics.ca
    46. 46. ICGC Controlled Access Datasets • Detailed Phenotype and Outcome data Region of residence Risk factors Examination Surgery Radiation Sample Slide Specific histological features Analyte Aliquot Donor notes • Gene Expression (probe-level data) • Raw genotype calls • Gene-sample identifier links • Genome sequence files ICGC OA Datasets • Cancer Pathology Histologic type or subtype Histologic nuclear grade • Patient/Person Gender, Age range, Vital status, Survival time Relapse type, Status at follow-up • Gene Expression (normalized) • DNA methylation •Computed Copy Number and Loss of Heterozygosity • Newly discovered somatic variants http://goo.gl/w4mrV Module 1: Cancer Genomic Databases bioinformatics.ca
    47. 47. Identify Identify yourself yourself Fill out detail form which Fill out detail form which includes: includes: ••Contact and Project Contact and Project Information Information ••InformationTechnology Information Technology details and procedures details and procedures for keeping data secure for keeping data secure ••DataAccess Agreement Data Access Agreement Module 1: Cancer Genomic Databases All of these All of these documents are documents are put into a PDF put into a PDF file that you file that you print and get your print and get your institution to sign institution to sign off on your behalf off on your behalf bioinformatics.ca
    48. 48. Module 1: Cancer Genomic Databases bioinformatics.ca
    49. 49. Module 1: Cancer Genomic Databases bioinformatics.ca
    50. 50. Module 1: Cancer Genomic Databases bioinformatics.ca
    51. 51. Module 1: Cancer Genomic Databases bioinformatics.ca
    52. 52. Module 1: Cancer Genomic Databases bioinformatics.ca
    53. 53. Module 1: Cancer Genomic Databases bioinformatics.ca
    54. 54. DACO approved projects Module 1: Cancer Genomic Databases bioinformatics.ca
    55. 55. DACO/DCC User Data Access Process • Users approved through DACO are now automatically granted access to ICGC controlled access datasets available through the ICGC Data Portal and the EBI’s EGA repository DACO Web DACO Web Application Application application approved by DACO user accounts activated DCC Data DCC Data Portal Portal DCC User DCC User Registry Registry EBI EGA EBI EGA Module 1: Cancer Genomic Databases bioinformatics.ca
    56. 56. Catalogue of Somatic Mutations in Cancer (COSMIC) • http://cancer.sanger.ac.uk/cancerg enome/projects/cosmic/ • COSMIC is designed to store and display somatic mutation information and related details and contains information relating to human cancers. Module 1: Cancer Genomic Databases bioinformatics.ca
    57. 57. COSMIC • Somatic Mutations Only • Diverse sources – Literature (Arrays, Next-Gen, PCR...) – TCGA – ICGC • Diverse ways to look at data – – – – – Gene Variation Tumour type Cell line Experiment Module 1: Cancer Genomic Databases bioinformatics.ca
    58. 58. FAQ Module 1: Cancer Genomic Databases bioinformatics.ca
    59. 59. Looking up your favorite gene 1 2 Module 1: Cancer Genomic Databases 3 bioinformatics.ca
    60. 60. Module 1: Cancer Genomic Databases bioinformatics.ca
    61. 61. Module 1: Cancer Genomic Databases bioinformatics.ca
    62. 62. In closing • Remember all these sites have great amounts of documentation • The field is changing quickly, and so are the portals. • New features are planned as we speak, and so you need to use the sites, and keep coming back. • Don’t be afraid to explore • Interested in learning more after today? Consider one of the bioinformatics.ca workshops! Module 1: Cancer Genomic Databases bioinformatics.ca
    63. 63. Acknowledgements: the CBW gang Michael Brudno Michael Stromberg Michelle Brazas Marc Fiume Module 1: Cancer Genomic Databases bioinformatics.ca

    ×