Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
The International Cancer Genome
Consortium (ICGC) Data Coordinating
Center (DCC)
November 14th 2013

B.F. Francis Ouellett...
Module #: Title of Module

2
You are free to:
Copy, share, adapt, or re-mix;
Photograph, film, or broadcast;

Blog, live-blog, or post video of;

This ...
Slides are on slideshare.net

• http://www.slideshare.net/bffo/ebi-oncogenomics-nov2013ouellettever03

http://goo.gl/HP613...
E-mail

francis@oicr.on.ca

@bffo

5
Disclaimer

I do not (and will not) profit in any way, shape or form,
from any of the brands, products or companies I may
...
7
Cancer therapy is like
beating the dog with
a stick to get rid of
his fleas.

- Anna Deavere Smith,
Let me down easy

8
http://goo.gl/Yhbsj
9
The revolution in cancer
research can summed up
in a single sentence:
cancer is in essence,
a genetic disease.
- Bert Voge...
Cancer
A Disease of the Genome

Challenge in Treating Cancer:
 Every tumor is different

 Every cancer patient is differ...
Large-Scale Studies of Cancer Genomes
 Johns Hopkins
> 18,000 genes analyzed for mutations
11 breast and 11 colon tumors
...
Lessons learned
 Heterogeneity within and across tumor types
 High rate of abnormalities (driver vs
passenger)
 Sample ...
2007
International Cancer Genome Consortium
• Collect ~500 tumour/normal pairs from each of 50 different major
cancer types;
• ...
Rationale for the ICGC
• The scope is huge, such that no country can do it all.
• Coordinated cancer genome initiatives wi...
International Cancer Genome Consortium
(ICGC)
Goals
• Catalogue genomic abnormalities in tumors in 50
different cancer typ...
Analysis Data Types
•
•
•
•
•
•
•
•

Simple Somatic Mutations
Copy Number Alterations
Structural Somatic Mutations
Gene Ex...
19
OICR’s mission

To build innovative research
programs that will have an impact
on the prevention, early detection,
diagnos...
OICR Informatics & Biocomputing Senior Staff

Lincoln Stein
Director, I&B
Sr. PI

Vincent Ferretti
Assoc. Director,
Bioinf...
http://icgc.org

22
22
ICGC Map – November 2013
67 projects launched

23
ICGC Committees & Working Groups

http://icgc.org/icgc/committees-and-working-groups

24
ICGC Project Teams @ OICR
• ICGC Secretariat
– Executive Chair: Thomas Hudson
– Senior Project Manager: Jennifer Jennings
...
DCC Activities
DCC activities are split between two groups:
• Software Development
– DCC portal
– Submission tool

• Curat...
ICGC Data Coordination Centre
A “comprehensive management system” providing:
•
•
•
•
•
•
•

Secure mechanism for uploading...
ICGC Data Coordination Centre (2)
Provides the following support to experimental
biologists, computational biologists, and...
ICGC Data Types
• Clinical Data
– Hosted by DCC via data portal
– Was 100% open access, but currently 9 data elements have...
Hardeep Nahal

ICGC datasets to date

ICGC Data Portal Cumulative Donor Count for Member Projects

10,000

Release 14

Rel...
ICGC dataset version 14
September 2013

Hardeep Nahal

• Cancer types: 41
• Donors: 8,532 (18,056 specimens)
• Simple soma...
Key DCC Activities for 2013
• Improved data & metadata curation at EGA; better
linking of data held at DCC to ICGC data in...
Moratorium:
http://www.icgc.org/icgc/goals-structure-policies-guidelines/e3-publication-policy

33
Where do you find that information?
• We actually make it hard to find, but we are
working on that! (this is an example of...
Where do you find that information?
For ICGC data:
• Need to find the policy!
• http://icgc.org/icgc/goals-structure-polic...
Time limits for publication moratoriums:
All data shall become free of a publication
moratorium when either:
1) the data i...
ERA

Open

TCGA

dbGaP

BA
BA
M
M

DACO
EGA
ICGC

BAM
BA

Open

M
Germ
Line

+ EGA id
37
ICGC
BAM/FASTQ

ICGC
Open
Data
(includes
TCGA
Open Data)

COSMIC
Open
Data

TCGA
BAM/FASTQ
Raw Data Availability at EGA by Project and Data Type
• https://www.ebi.ac.uk/ega/organisations/EGAO00000000024

39
Cooperation with EBI EGA Repository for
Controlled Access Raw Data
• Concerted efforts with EGA staff to support
coordinat...
What the users see?
• Important to have a data portal that represents the
richness of the data that we generate, but to al...
Uniform Annotations
• Annotating Simple Somatic Mutations (SSM) and Simple
Germline Variations (SGV)
• DCC is currently im...
http://icgc.org

43
44
Select “Pancreatic cancer – Canada”

45
… But where is the data?

46
47
http://dcc.icgc.org/

48
49
Highlights of the new portal: dcc.icgc.org
• Faceted searches capabilities for variants, genes and
donors
– Interactive da...
Technologies

Chaplin

Brian O’Connor/
Vincent Ferretti

Web GUI

Indexing

Processing
&
Data Model

Core

51
52
KRAS search

53
•
•
•
•
•
•
•
•

Summary
Cancer type distribution
Other links (Cosmic, Entrez, etc)
Mutation profile in protein
Domains
Ge...
http://dcc.icgc.org/genes/ENSG00000133703

55
56
57
58
http://goo.gl/qUzuAi

59
60
Donor
•
•
•
•
•
•
•
•
•
•
•

Donor ID
Primary site
Cancer Project
Gender
Tumor Stage
Vital Status
Disease Status
Release t...
Genes

62
Mutations
•
•
•
•

Consequences
Type
Platform
Verification status

63
Exporting data

64
Exporting data

65
66
67
Exporting data

68
Can do bulk download of the data …

69
ICGC
BAM/FASTQ

ICGC
Open
Data
(includes
TCGA
Open Data)

COSMIC
Open
Data

TCGA
BAM/FASTQ
ERA

Open

TCGA

dbGaP

BA
BA
M
M

DACO
EGA
ICGC

BAM
BA

Open

M
Germ
Line

+ EGA id
71
ICGC Data Categories

ICGC Open Access Datasets

ICGC Controlled Access Datasets

 Cancer Pathology
Histologic type or su...
Module 1: Cancer Genomic Databases

bioinformatics.ca
http://icgc.org/daco

Module 1: Cancer Genomic Databases

bioinformatics.ca
ICGC Controlled
Access Datasets
• Detailed Phenotype and Outcome data
Region of residence
Risk factors
Examination
Surgery...
Identify
yourself

Fill out detail form which
includes:
• Contact and Project
Information
•Information Technology
details ...
Module 1: Cancer Genomic Databases

bioinformatics.ca
Module 1: Cancer Genomic Databases

bioinformatics.ca
Module 1: Cancer Genomic Databases

bioinformatics.ca
Module 1: Cancer Genomic Databases

bioinformatics.ca
Module 1: Cancer Genomic Databases

bioinformatics.ca
Module 1: Cancer Genomic Databases

bioinformatics.ca
DACO approved projects:
59 groups - 75% academic
(~400 people)

Module 1: Cancer Genomic Databases

bioinformatics.ca
DACO/DCC User Data Access Process
•

Users approved through DACO are now automatically granted access to
ICGC controlled a...
Future Work for the DCC
•

Work with projects to improve in a number of areas:
– clinical data content,
– Increasing frequ...
Future Work for the DCC

• New releases:
– Release 15: finished before Christmas
• All data submission sent in again, plus...
New Project: ICGC PANCANCER analysis
• 2,000 Whole genome sequencing
–
–
–
–
–
–

6 cloud infrastructures across the world...
Challenges and Opertunity
• Targetted sequencing for Patient
Selection
• Consent
• Combinations
• Corrected features and #...
FGED’s mission:

To be a positive agent of
change in the effective
sharing and reproducibility
of functional genomic data
...
Acknowledgments

http://oicr.on.ca

ICGC Project leaders
at the OICR:

Ouellette Lab

• FGED

Michelle Brazas
Emilie Chaut...
Informatics and Biocomputing at the OICR

91
Maya et Pascale, 2012

92
http://icgc.org
info@icgc.org
This presentation: http://goo.gl/HP613K
Video tutorial: https://vimeo.com/75522669
93
Upcoming SlideShare
Loading in …5
×

International Cancer Genomics Consortium (ICGC) Data Coordinating Center

3,049 views

Published on

EBI sponsored oncogenomics workshop presentation from Francis Ouellette

  • Be the first to comment

International Cancer Genomics Consortium (ICGC) Data Coordinating Center

  1. 1. The International Cancer Genome Consortium (ICGC) Data Coordinating Center (DCC) November 14th 2013 B.F. Francis Ouellette • • francis@oicr.on.ca Senior Scientists & Associate Director, Informatics and Biocomputing, Ontario Institute for Cancer Research, Toronto, ON Associate Professor, Department of Cell and Systems Biology, University of Toronto, Toronto, ON.
  2. 2. Module #: Title of Module 2
  3. 3. You are free to: Copy, share, adapt, or re-mix; Photograph, film, or broadcast; Blog, live-blog, or post video of; This presentation. Provided that: You attribute the work to its author and respect the rights and licenses associated with its components. Slide Concept by Cameron Neylon, who has waived all copyright and related or neighbouring rights. This slide only ccZero. Social Media Icons adapted with permission from originals by Christopher Ross. Original images are available under GPL at; http://www.thisismyurl.com/free-downloads/15-free-speech-bubble-icons-for-popular-websites 3
  4. 4. Slides are on slideshare.net • http://www.slideshare.net/bffo/ebi-oncogenomics-nov2013ouellettever03 http://goo.gl/HP613K 4
  5. 5. E-mail francis@oicr.on.ca @bffo 5
  6. 6. Disclaimer I do not (and will not) profit in any way, shape or form, from any of the brands, products or companies I may mention. 6
  7. 7. 7
  8. 8. Cancer therapy is like beating the dog with a stick to get rid of his fleas. - Anna Deavere Smith, Let me down easy 8
  9. 9. http://goo.gl/Yhbsj 9
  10. 10. The revolution in cancer research can summed up in a single sentence: cancer is in essence, a genetic disease. - Bert Vogelstein 10
  11. 11. Cancer A Disease of the Genome Challenge in Treating Cancer:  Every tumor is different  Every cancer patient is different 11
  12. 12. Large-Scale Studies of Cancer Genomes  Johns Hopkins > 18,000 genes analyzed for mutations 11 breast and 11 colon tumors L.D. Wood et al, Science, Oct. 2007  Wellcome Trust Sanger Institute 518 genes analyzed for mutations 210 tumors of various types C. Greenman et al, Nature, Mar. 2007  TCGA (NIH) Multiple technologies brain (glioblastoma multiforme), lung (squamous carcinoma), and ovarian (serous cystadenocarcinoma). F.S. Collins & A.D. Barker, Sci. Am, Mar. 2007 12
  13. 13. Lessons learned  Heterogeneity within and across tumor types  High rate of abnormalities (driver vs passenger)  Sample quality matters 13
  14. 14. 2007
  15. 15. International Cancer Genome Consortium • Collect ~500 tumour/normal pairs from each of 50 different major cancer types; • Comprehensive genome analysis of each T/N pair: – – – – Genome Transcriptome Methylome Clinical data • Make the data available to the research community & public. Identify genome changes …GATTATTCCAGGTAT… …GATTATTGCAGGTAT… GCAGGTAT… …GATTATT 15
  16. 16. Rationale for the ICGC • The scope is huge, such that no country can do it all. • Coordinated cancer genome initiatives will reduce duplication of effort for common and easy to acquire tumor samples and and ensure complete studies for many less frequent forms of cancer. • Standardization and uniform quality measures across studies will enable the merging of datasets, increasing power to detect additional targets. • The spectrum of many cancers varies across the world for many tumor types, because of environmental, genetic and other causes. • The ICGC will accelerate the dissemination of genomic and analytical methods across participating sites, and the user community 16
  17. 17. International Cancer Genome Consortium (ICGC) Goals • Catalogue genomic abnormalities in tumors in 50 different cancer types and/or subtypes of clinical and societal importance across the globe • Generate complementary catalogues of transcriptomic and epigenomic datasets from the same tumors • Make the data available to research community rapidly with minimal restrictions to accelerate research into the causes and control of cancer 50 tumor types and/or subtypes 500 tumors + 500 controls per subtype 50,000 Human Genome Projects! Nature (2010) 464:993 17
  18. 18. Analysis Data Types • • • • • • • • Simple Somatic Mutations Copy Number Alterations Structural Somatic Mutations Gene Expression (micro-arrays and RNASeq) miRNA Expression (RNASeq) Epigenomics (Arrays and Methylation) Splicing Variation Protein Expression 18
  19. 19. 19
  20. 20. OICR’s mission To build innovative research programs that will have an impact on the prevention, early detection, diagnosis and treatment of cancer. 20
  21. 21. OICR Informatics & Biocomputing Senior Staff Lincoln Stein Director, I&B Sr. PI Vincent Ferretti Assoc. Director, Bioinf. Software Dev Sr. PI Francis Ouellette Assoc. Director, I&B Paul Boutros Jr. PI Lakshmi Muthuswamy Jr. PI David Sutton Director, IT Paul Shoichet BrianBoutros Jr. PI Sr. PI May 2013 Tatiana Lomasko Program Manager Jared Simpson OICR Fellow May 2013 21
  22. 22. http://icgc.org 22 22
  23. 23. ICGC Map – November 2013 67 projects launched 23
  24. 24. ICGC Committees & Working Groups http://icgc.org/icgc/committees-and-working-groups 24
  25. 25. ICGC Project Teams @ OICR • ICGC Secretariat – Executive Chair: Thomas Hudson – Senior Project Manager: Jennifer Jennings – Administrative Coordinator: Jaypee Banlawi • (with the support of the Web Development team) • ICGC Data Coordination Center (DCC) – DCC Leader: Lincoln Stein – DCC Co-Leader: Francis Ouellette – DCC Software Development Team Leader: Vincent Ferretti (+6 FTE) – DCC Data Curation: Hardeep Nahal (+1 FTE) 25
  26. 26. DCC Activities DCC activities are split between two groups: • Software Development – DCC portal – Submission tool • Curation (and Content Management) – – – – Data level management Submitter “handling” Coordination with secratariat User support http://dcc.icgc.org/team 26 26
  27. 27. ICGC Data Coordination Centre A “comprehensive management system” providing: • • • • • • • Secure mechanism for uploading data Track uploads and perform integrity checks Regular progress reporting (data audit) Quality checks (coverage, correctness, etc.) Enable distribution of raw data to public repositories Provide essential metadata to public repositories Integrate with other public repositories via standard data formats, ontologies, etc. 27 27
  28. 28. ICGC Data Coordination Centre (2) Provides the following support to experimental biologists, computational biologists, and other researchers: • • • • Download of complete dataset, or subsets Restrict protected data to authorized users (controlled access) Search data by gene or specimen, or lists thereof Interactive system for identifying specimens of interest, finding what data sets are available for those specimens, selecting data slices across those specimens (e.g., counts of the number of somatic mutations observed a region within the UTR of a gene of interest), and running basic analytic tests on those data slices 28 28
  29. 29. ICGC Data Types • Clinical Data – Hosted by DCC via data portal – Was 100% open access, but currently 9 data elements have been flagged by DACO as controlled access and are under review by IDAC • Experimental Analysis Data – Hosted by DCC via data portal – Somatic is open access, germline is controlled • “Raw” Sequencing Data (+ array data, etc.) – Hosted at other public repositories – Primary repository for ICGC sequence data is EBI EGA – TCGA raw data hosted at CGhub 29
  30. 30. Hardeep Nahal ICGC datasets to date ICGC Data Portal Cumulative Donor Count for Member Projects 10,000 Release 14 Release 11 Release 13 9000 Release 12 8000 Release 10 Release 9 7000 6000 Number of Donors 5000 Release 8 4000 Release 7 3000 2000 1000 Dec-11 Jan-2012 Feb March April May June July Aug Sept Oct Nov Dec Jan-2013 Feb March April May June July Aug Sept-2013 30
  31. 31. ICGC dataset version 14 September 2013 Hardeep Nahal • Cancer types: 41 • Donors: 8,532 (18,056 specimens) • Simple somatic mutations: 1,995,134 • Copy number mutations: 18,526,593 • Structural rearrangements: 18,614 • Genes affected* by simple somatic mutations: 22,074 • Genes affected* by non-synonymous coding mutations: 19,150 Genes affected* by copy number mutations: 20,341 • Genes affected* by structural rearrangements: 1,884 • *out 22,259 protein coding genes annotated in Ensembl Human release 69 • Open tier and controlled data currently available
  32. 32. Key DCC Activities for 2013 • Improved data & metadata curation at EGA; better linking of data held at DCC to ICGC data in other repositories (currently not perfect) • Improved data quality/integrity checking through new submission/validation system; review of submission file specifications • Integration of new data submission system and portal infrastructure with project and user information managed at ICGC.org 32
  33. 33. Moratorium: http://www.icgc.org/icgc/goals-structure-policies-guidelines/e3-publication-policy 33
  34. 34. Where do you find that information? • We actually make it hard to find, but we are working on that! (this is an example of where ICGC would like to do what TCGA does!) • http://cancergenome.nih.gov/publications/publicatio nguidelines 34
  35. 35. Where do you find that information? For ICGC data: • Need to find the policy! • http://icgc.org/icgc/goals-structure-policiesguidelines/e3-publication-policy • Find text: • Published > no embargo • < 100 tumors > 2 years • > 100 tumors > 1 year • Find date: in README on FTP file • (exception in README) • This is bad, we know it, and we are fixing it! • In doubt? Contact us! info@icgc.org 35
  36. 36. Time limits for publication moratoriums: All data shall become free of a publication moratorium when either: 1) the data is published by the ICGC member project 2) one year after a specified quantity of data (e.g. genome dataset from 100 tumours per project) has been released via the ICGC database or other public databases. 3) In all cases data shall be free of a publication moratorium two years after its initial release. 36
  37. 37. ERA Open TCGA dbGaP BA BA M M DACO EGA ICGC BAM BA Open M Germ Line + EGA id 37
  38. 38. ICGC BAM/FASTQ ICGC Open Data (includes TCGA Open Data) COSMIC Open Data TCGA BAM/FASTQ
  39. 39. Raw Data Availability at EGA by Project and Data Type • https://www.ebi.ac.uk/ega/organisations/EGAO00000000024 39
  40. 40. Cooperation with EBI EGA Repository for Controlled Access Raw Data • Concerted efforts with EGA staff to support coordinated data submissions to both ICGC DCC & EGA • Infrastructure to grant controlled data access automatically on approval of ICGC DACO web application forms 40 40
  41. 41. What the users see? • Important to have a data portal that represents the richness of the data that we generate, but to also make sure biologists and clinicians can actually use the data & make discoveries! • Important to have a scalable technology that will support 50,000 human genomes, and thousands of concurrent users (we don’t have that many yet) 41
  42. 42. Uniform Annotations • Annotating Simple Somatic Mutations (SSM) and Simple Germline Variations (SGV) • DCC is currently implementing the snpEff software ◦ Recommended by the ICGC Bioinformatics Analysis Working Group ◦ Returns Sequence Ontology's controlled vocabulary regarding mutation-induced changes (www.sequenceontology.org) • ICGC members will not be required to annotate SSM and SGV for the ICGC data releases 42
  43. 43. http://icgc.org 43
  44. 44. 44
  45. 45. Select “Pancreatic cancer – Canada” 45
  46. 46. … But where is the data? 46
  47. 47. 47
  48. 48. http://dcc.icgc.org/ 48
  49. 49. 49
  50. 50. Highlights of the new portal: dcc.icgc.org • Faceted searches capabilities for variants, genes and donors – Interactive data exploration fast and easy • Mutation aggregation & counts across donors and cancers – # of pancreatic cancers donors with mutation KRAS G12D • • • • • Standardized gene consequence across all projects Genome browser Data doewnload Protein domains Links to repositories 50
  51. 51. Technologies Chaplin Brian O’Connor/ Vincent Ferretti Web GUI Indexing Processing & Data Model Core 51
  52. 52. 52
  53. 53. KRAS search 53
  54. 54. • • • • • • • • Summary Cancer type distribution Other links (Cosmic, Entrez, etc) Mutation profile in protein Domains Genomic Context Mutation profile Most common mutations 54
  55. 55. http://dcc.icgc.org/genes/ENSG00000133703 55
  56. 56. 56
  57. 57. 57
  58. 58. 58
  59. 59. http://goo.gl/qUzuAi 59
  60. 60. 60
  61. 61. Donor • • • • • • • • • • • Donor ID Primary site Cancer Project Gender Tumor Stage Vital Status Disease Status Release type Age at diagnosis Available data types Analysis types 61
  62. 62. Genes 62
  63. 63. Mutations • • • • Consequences Type Platform Verification status 63
  64. 64. Exporting data 64
  65. 65. Exporting data 65
  66. 66. 66
  67. 67. 67
  68. 68. Exporting data 68
  69. 69. Can do bulk download of the data … 69
  70. 70. ICGC BAM/FASTQ ICGC Open Data (includes TCGA Open Data) COSMIC Open Data TCGA BAM/FASTQ
  71. 71. ERA Open TCGA dbGaP BA BA M M DACO EGA ICGC BAM BA Open M Germ Line + EGA id 71
  72. 72. ICGC Data Categories ICGC Open Access Datasets ICGC Controlled Access Datasets  Cancer Pathology Histologic type or subtype Histologic nuclear grade  Donor Gender Age range  RNA expression (normalized)  DNA methylation  Genotype frequencies  Somatic mutations (SNV, CNV and Structural Rearrangement) Detailed Phenotype and Outcome Data Patient demography Risk factors Examination Surgery/Drugs/Radiation Sample/Slide Specific histological features Protocol Analyte/Aliquot Gene Expression (probe-level data) Raw genotype calls (germline) Gene-sample identifier links Genome sequence files Most of the data in the portal is publically available without restriction. However, access to some data, like the germline mutations, requires authorization by the Data Access Compliance Office (DACO) 72
  73. 73. Module 1: Cancer Genomic Databases bioinformatics.ca
  74. 74. http://icgc.org/daco Module 1: Cancer Genomic Databases bioinformatics.ca
  75. 75. ICGC Controlled Access Datasets • Detailed Phenotype and Outcome data Region of residence Risk factors Examination Surgery Radiation Sample Slide Specific histological features Analyte Aliquot Donor notes • Gene Expression (probe-level data) • Raw genotype calls • Gene-sample identifier links • Genome sequence files ICGC OA Datasets • Cancer Pathology Histologic type or subtype Histologic nuclear grade • Patient/Person Gender, Age range, Vital status, Survival time Relapse type, Status at follow-up • Gene Expression (normalized) • DNA methylation •Computed Copy Number and Loss of Heterozygosity • Newly discovered somatic variants http://goo.gl/w4mrV 75
  76. 76. Identify yourself Fill out detail form which includes: • Contact and Project Information •Information Technology details and procedures for keeping data secure •Data Access Agreement Module 1: Cancer Genomic Databases All of these documents are put into a PDF file that you print and get your institution to sign off on your behalf bioinformatics.ca
  77. 77. Module 1: Cancer Genomic Databases bioinformatics.ca
  78. 78. Module 1: Cancer Genomic Databases bioinformatics.ca
  79. 79. Module 1: Cancer Genomic Databases bioinformatics.ca
  80. 80. Module 1: Cancer Genomic Databases bioinformatics.ca
  81. 81. Module 1: Cancer Genomic Databases bioinformatics.ca
  82. 82. Module 1: Cancer Genomic Databases bioinformatics.ca
  83. 83. DACO approved projects: 59 groups - 75% academic (~400 people) Module 1: Cancer Genomic Databases bioinformatics.ca
  84. 84. DACO/DCC User Data Access Process • Users approved through DACO are now automatically granted access to ICGC controlled access datasets available through the ICGC Data Portal and the EBI’s EGA repository user accounts activated application approved by DACO DACO Web Application DCC Data Portal DCC User Registry EBI EGA 84
  85. 85. Future Work for the DCC • Work with projects to improve in a number of areas: – clinical data content, – Increasing frequency of data release • Better metadata collection from the EGA – Working with EGA to better match metadata requirements for ICGC member submissions; will enable reliable linking by Sample ID, Donor ID, etc. between data portal and EGA. Will allow direct link to DACO approved users – Projects will be required to provide this required metadata at submission time, existing EGA datasets will be updated. • Improve access to projects’ analysis methods – Suggested publishing analysis SOPs in Standards in Genomic Sciences at most recent ICGC workshop; haven’t seen any interest in doing this from member projects. – DCC to host centralized web page(s) for each project’s analysis methods; use permalink in submission files. • • • Better documentation … always need more! Better transparency of processes Better links to publications 85 85
  86. 86. Future Work for the DCC • New releases: – Release 15: finished before Christmas • All data submission sent in again, plus new data • (no methylation data) – Release 16: incremental submission + Methylation data, released before May – Release 17: adopt incremental for all data types, and increase frequency of releases. 86 86
  87. 87. New Project: ICGC PANCANCER analysis • 2,000 Whole genome sequencing – – – – – – 6 cloud infrastructures across the world Appropriate policy and tool availability Agreed upon shared pipelines, and others Shared datasets Petabytes of files, 10,000’s cores Mutation analysis, as well as CNV, Structural, others when feasible (RNA and methylome). 87
  88. 88. Challenges and Opertunity • Targetted sequencing for Patient Selection • Consent • Combinations • Corrected features and #features >> #samples • Noisy and incomplete data • Speed and cost We are also hiring! Adapted from Paul Rejto, Pfizer 88
  89. 89. FGED’s mission: To be a positive agent of change in the effective sharing and reproducibility of functional genomic data fged.org 89
  90. 90. Acknowledgments http://oicr.on.ca ICGC Project leaders at the OICR: Ouellette Lab • FGED Michelle Brazas Emilie Chautard Nina Palikuca Matthew Ziembicki Alvis Brazma Roger Bumgarner Cesare Furlanello Michael Miller Francis Ouellette John Quackenbush – Dana-Farber Michael Reich Gabriella Rustici Chris Stoeckert Ronald Taylor Steve Trutane Jennifer Weller Brian Wilhelm Neil Winegarden • Tom Hudson • John McPherson • Lincoln Stein • Paul Boutros • Lakshmi Mutsawarma • Vincent Ferretti • Francis Ouellette • Jennifer Jennings DCC Software Developer Vincent Ferretti Brian O’Connor Junjun Zhang Anthony Cros Jonathan Guberman Bob Tiernay Shane Wilson Long Yao Daniel Chang Jerry Lam Stuart Watt … and all the patients and their families that that are putting their hopes into our work! http://icgc.org Web Dev Miyuki Fukuma Kamen Wu Joseph Yamada Salman Badr Pipeline Development & Evaluation Morgan Taschuk Rob Denroche Peter Ruzanov Zhibin Lu DCC Data Coordinator Hardeep Nahal 90
  91. 91. Informatics and Biocomputing at the OICR 91
  92. 92. Maya et Pascale, 2012 92
  93. 93. http://icgc.org info@icgc.org This presentation: http://goo.gl/HP613K Video tutorial: https://vimeo.com/75522669 93

×