SlideShare a Scribd company logo
1 of 93
The International Cancer Genome
Consortium (ICGC) Data Coordinating
Center (DCC)
November 14th 2013

B.F. Francis Ouellette
•
•

francis@oicr.on.ca
Senior Scientists & Associate Director,
Informatics and Biocomputing, Ontario Institute for
Cancer Research, Toronto, ON
Associate Professor, Department of Cell and Systems Biology,
University of Toronto, Toronto, ON.
Module #: Title of Module

2
You are free to:
Copy, share, adapt, or re-mix;
Photograph, film, or broadcast;

Blog, live-blog, or post video of;

This presentation. Provided that:
You attribute the work to its author and respect the rights
and licenses associated with its components.

Slide Concept by Cameron Neylon, who has waived all copyright and related or neighbouring rights. This slide only ccZero.
Social Media Icons adapted with permission from originals by Christopher Ross. Original images are available under GPL at;
http://www.thisismyurl.com/free-downloads/15-free-speech-bubble-icons-for-popular-websites

3
Slides are on slideshare.net

• http://www.slideshare.net/bffo/ebi-oncogenomics-nov2013ouellettever03

http://goo.gl/HP613K

4
E-mail

francis@oicr.on.ca

@bffo

5
Disclaimer

I do not (and will not) profit in any way, shape or form,
from any of the brands, products or companies I may
mention.

6
7
Cancer therapy is like
beating the dog with
a stick to get rid of
his fleas.

- Anna Deavere Smith,
Let me down easy

8
http://goo.gl/Yhbsj
9
The revolution in cancer
research can summed up
in a single sentence:
cancer is in essence,
a genetic disease.
- Bert Vogelstein

10
Cancer
A Disease of the Genome

Challenge in Treating Cancer:
 Every tumor is different

 Every cancer patient is different
11
Large-Scale Studies of Cancer Genomes
 Johns Hopkins
> 18,000 genes analyzed for mutations
11 breast and 11 colon tumors
L.D. Wood et al, Science, Oct. 2007
 Wellcome Trust Sanger Institute
518 genes analyzed for mutations
210 tumors of various types
C. Greenman et al, Nature, Mar. 2007
 TCGA (NIH)
Multiple technologies
brain (glioblastoma multiforme), lung (squamous
carcinoma), and ovarian (serous cystadenocarcinoma).
F.S. Collins & A.D. Barker, Sci. Am, Mar. 2007
12
Lessons learned
 Heterogeneity within and across tumor types
 High rate of abnormalities (driver vs
passenger)
 Sample quality matters

13
2007
International Cancer Genome Consortium
• Collect ~500 tumour/normal pairs from each of 50 different major
cancer types;
• Comprehensive genome analysis of each T/N pair:
–
–
–
–

Genome
Transcriptome
Methylome
Clinical data

• Make the data available to the research community & public.

Identify
genome
changes
…GATTATTCCAGGTAT…

…GATTATTGCAGGTAT…

GCAGGTAT…

…GATTATT

15
Rationale for the ICGC
• The scope is huge, such that no country can do it all.
• Coordinated cancer genome initiatives will reduce
duplication of effort for common and easy to acquire
tumor samples and and ensure complete studies for many
less frequent forms of cancer.
• Standardization and uniform quality measures across
studies will enable the merging of datasets, increasing
power to detect additional targets.
• The spectrum of many cancers varies across the
world for many tumor types, because of environmental,
genetic and other causes.
• The ICGC will accelerate the dissemination of genomic
and analytical methods across participating sites, and
the user community

16
International Cancer Genome Consortium
(ICGC)
Goals
• Catalogue genomic abnormalities in tumors in 50
different cancer types and/or subtypes of clinical and
societal importance across the globe
•

Generate complementary catalogues of transcriptomic
and epigenomic datasets from the same tumors

•

Make the data available to research community rapidly
with minimal restrictions to accelerate research into the
causes and control of cancer

50 tumor types and/or subtypes
500 tumors + 500 controls per subtype
50,000 Human Genome Projects!

Nature (2010) 464:993

17
Analysis Data Types
•
•
•
•
•
•
•
•

Simple Somatic Mutations
Copy Number Alterations
Structural Somatic Mutations
Gene Expression (micro-arrays and RNASeq)
miRNA Expression (RNASeq)
Epigenomics (Arrays and Methylation)
Splicing Variation
Protein Expression
18
19
OICR’s mission

To build innovative research
programs that will have an impact
on the prevention, early detection,
diagnosis and treatment of
cancer.
20
OICR Informatics & Biocomputing Senior Staff

Lincoln Stein
Director, I&B
Sr. PI

Vincent Ferretti
Assoc. Director,
Bioinf. Software Dev
Sr. PI

Francis Ouellette
Assoc. Director, I&B

Paul Boutros
Jr. PI

Lakshmi
Muthuswamy
Jr. PI

David Sutton
Director, IT

Paul Shoichet
BrianBoutros
Jr. PI
Sr. PI
May 2013

Tatiana Lomasko
Program Manager

Jared Simpson
OICR Fellow
May 2013

21
http://icgc.org

22
22
ICGC Map – November 2013
67 projects launched

23
ICGC Committees & Working Groups

http://icgc.org/icgc/committees-and-working-groups

24
ICGC Project Teams @ OICR
• ICGC Secretariat
– Executive Chair: Thomas Hudson
– Senior Project Manager: Jennifer Jennings
– Administrative Coordinator: Jaypee Banlawi
• (with the support of the Web Development team)

• ICGC Data Coordination Center (DCC)
– DCC Leader: Lincoln Stein
– DCC Co-Leader: Francis Ouellette
– DCC Software Development Team Leader: Vincent
Ferretti (+6 FTE)
– DCC Data Curation: Hardeep Nahal (+1 FTE)
25
DCC Activities
DCC activities are split between two groups:
• Software Development
– DCC portal
– Submission tool

• Curation (and Content Management)
–
–
–
–

Data level management
Submitter “handling”
Coordination with secratariat
User support

http://dcc.icgc.org/team
26
26
ICGC Data Coordination Centre
A “comprehensive management system” providing:
•
•
•
•
•
•
•

Secure mechanism for uploading data
Track uploads and perform integrity checks
Regular progress reporting (data audit)
Quality checks (coverage, correctness, etc.)
Enable distribution of raw data to public repositories
Provide essential metadata to public repositories
Integrate with other public repositories via standard data
formats, ontologies, etc.

27
27
ICGC Data Coordination Centre (2)
Provides the following support to experimental
biologists, computational biologists, and other
researchers:
•
•
•
•

Download of complete dataset, or subsets
Restrict protected data to authorized users (controlled access)
Search data by gene or specimen, or lists thereof
Interactive system for identifying specimens of interest, finding what
data sets are available for those specimens, selecting data slices
across those specimens (e.g., counts of the number of somatic
mutations observed a region within the UTR of a gene of interest), and
running basic analytic tests on those data slices

28
28
ICGC Data Types
• Clinical Data
– Hosted by DCC via data portal
– Was 100% open access, but currently 9 data elements have been flagged by DACO
as controlled access and are under review by IDAC

• Experimental Analysis Data
– Hosted by DCC via data portal
– Somatic is open access, germline is controlled

• “Raw” Sequencing Data (+ array data, etc.)
– Hosted at other public repositories
– Primary repository for ICGC sequence data is EBI EGA
– TCGA raw data hosted at CGhub

29
Hardeep Nahal

ICGC datasets to date

ICGC Data Portal Cumulative Donor Count for Member Projects

10,000

Release 14

Release 11

Release 13

9000

Release 12
8000

Release 10
Release 9

7000

6000

Number
of
Donors
5000

Release 8

4000

Release 7

3000

2000

1000

Dec-11

Jan-2012

Feb

March

April

May

June

July

Aug

Sept

Oct

Nov

Dec

Jan-2013

Feb

March

April

May

June

July

Aug

Sept-2013

30
ICGC dataset version 14
September 2013

Hardeep Nahal

• Cancer types: 41
• Donors: 8,532 (18,056 specimens)
• Simple somatic mutations: 1,995,134

• Copy number mutations: 18,526,593
• Structural rearrangements: 18,614
• Genes affected* by simple somatic mutations: 22,074
• Genes affected* by non-synonymous coding mutations: 19,150 Genes
affected* by copy number mutations: 20,341
• Genes affected* by structural rearrangements: 1,884
•

*out 22,259 protein coding genes annotated in Ensembl Human release 69

• Open tier and controlled data currently available
Key DCC Activities for 2013
• Improved data & metadata curation at EGA; better
linking of data held at DCC to ICGC data in other
repositories (currently not perfect)
• Improved data quality/integrity checking through
new submission/validation system; review of
submission file specifications
• Integration of new data submission system and
portal infrastructure with project and user
information managed at ICGC.org
32
Moratorium:
http://www.icgc.org/icgc/goals-structure-policies-guidelines/e3-publication-policy

33
Where do you find that information?
• We actually make it hard to find, but we are
working on that! (this is an example of where ICGC
would like to do what TCGA does!)
• http://cancergenome.nih.gov/publications/publicatio
nguidelines

34
Where do you find that information?
For ICGC data:
• Need to find the policy!
• http://icgc.org/icgc/goals-structure-policiesguidelines/e3-publication-policy
• Find text:
• Published > no embargo
• < 100 tumors > 2 years
• > 100 tumors > 1 year
• Find date: in README on FTP file
• (exception in README)
• This is bad, we know it, and we are fixing it!

• In doubt? Contact us! info@icgc.org

35
Time limits for publication moratoriums:
All data shall become free of a publication
moratorium when either:
1) the data is published by the ICGC member project
2) one year after a specified quantity of data (e.g.
genome dataset from 100 tumours per project)
has been released via the ICGC database or
other public databases.
3) In all cases data shall be free of a publication
moratorium two years after its initial release.

36
ERA

Open

TCGA

dbGaP

BA
BA
M
M

DACO
EGA
ICGC

BAM
BA

Open

M
Germ
Line

+ EGA id
37
ICGC
BAM/FASTQ

ICGC
Open
Data
(includes
TCGA
Open Data)

COSMIC
Open
Data

TCGA
BAM/FASTQ
Raw Data Availability at EGA by Project and Data Type
• https://www.ebi.ac.uk/ega/organisations/EGAO00000000024

39
Cooperation with EBI EGA Repository for
Controlled Access Raw Data
• Concerted efforts with EGA staff to support
coordinated data submissions to both ICGC DCC
& EGA
• Infrastructure to grant controlled data access
automatically on approval of ICGC DACO web
application forms

40
40
What the users see?
• Important to have a data portal that represents the
richness of the data that we generate, but to also
make sure biologists and clinicians can actually
use the data & make discoveries!
• Important to have a scalable technology that will
support 50,000 human genomes, and thousands of
concurrent users (we don’t have that many yet)

41
Uniform Annotations
• Annotating Simple Somatic Mutations (SSM) and Simple
Germline Variations (SGV)
• DCC is currently implementing the snpEff software
◦ Recommended by the ICGC Bioinformatics Analysis
Working Group
◦ Returns Sequence Ontology's controlled vocabulary
regarding mutation-induced changes
(www.sequenceontology.org)

• ICGC members will not be required to annotate
SSM and SGV for the ICGC data releases
42
http://icgc.org

43
44
Select “Pancreatic cancer – Canada”

45
… But where is the data?

46
47
http://dcc.icgc.org/

48
49
Highlights of the new portal: dcc.icgc.org
• Faceted searches capabilities for variants, genes and
donors
– Interactive data exploration fast and easy

• Mutation aggregation & counts across donors and cancers
– # of pancreatic cancers donors with mutation KRAS G12D

•
•
•
•
•

Standardized gene consequence across all projects
Genome browser
Data doewnload
Protein domains
Links to repositories

50
Technologies

Chaplin

Brian O’Connor/
Vincent Ferretti

Web GUI

Indexing

Processing
&
Data Model

Core

51
52
KRAS search

53
•
•
•
•
•
•
•
•

Summary
Cancer type distribution
Other links (Cosmic, Entrez, etc)
Mutation profile in protein
Domains
Genomic Context
Mutation profile
Most common mutations

54
http://dcc.icgc.org/genes/ENSG00000133703

55
56
57
58
http://goo.gl/qUzuAi

59
60
Donor
•
•
•
•
•
•
•
•
•
•
•

Donor ID
Primary site
Cancer Project
Gender
Tumor Stage
Vital Status
Disease Status
Release type
Age at diagnosis
Available data types
Analysis types

61
Genes

62
Mutations
•
•
•
•

Consequences
Type
Platform
Verification status

63
Exporting data

64
Exporting data

65
66
67
Exporting data

68
Can do bulk download of the data …

69
ICGC
BAM/FASTQ

ICGC
Open
Data
(includes
TCGA
Open Data)

COSMIC
Open
Data

TCGA
BAM/FASTQ
ERA

Open

TCGA

dbGaP

BA
BA
M
M

DACO
EGA
ICGC

BAM
BA

Open

M
Germ
Line

+ EGA id
71
ICGC Data Categories

ICGC Open Access Datasets

ICGC Controlled Access Datasets

 Cancer Pathology
Histologic type or subtype
Histologic nuclear grade
 Donor
Gender
Age range
 RNA expression (normalized)
 DNA methylation
 Genotype frequencies
 Somatic mutations (SNV,
CNV and Structural
Rearrangement)

Detailed Phenotype and Outcome Data
Patient demography
Risk factors
Examination
Surgery/Drugs/Radiation
Sample/Slide
Specific histological features
Protocol
Analyte/Aliquot
Gene Expression (probe-level data)
Raw genotype calls (germline)
Gene-sample identifier links
Genome sequence files

Most of the data in the portal is publically available without restriction. However,
access to some data, like the germline mutations, requires authorization by the Data
Access Compliance Office (DACO)
72
Module 1: Cancer Genomic Databases

bioinformatics.ca
http://icgc.org/daco

Module 1: Cancer Genomic Databases

bioinformatics.ca
ICGC Controlled
Access Datasets
• Detailed Phenotype and Outcome data
Region of residence
Risk factors
Examination
Surgery
Radiation
Sample
Slide
Specific histological features
Analyte
Aliquot
Donor notes
• Gene Expression (probe-level data)
• Raw genotype calls
• Gene-sample identifier links
• Genome sequence files

ICGC OA
Datasets
• Cancer Pathology
Histologic type or subtype
Histologic nuclear grade
• Patient/Person
Gender, Age range,
Vital status, Survival time
Relapse type, Status at follow-up
• Gene Expression (normalized)
• DNA methylation
•Computed Copy Number and
Loss of Heterozygosity
• Newly discovered somatic variants
http://goo.gl/w4mrV

75
Identify
yourself

Fill out detail form which
includes:
• Contact and Project
Information
•Information Technology
details and procedures
for keeping data secure
•Data Access Agreement

Module 1: Cancer Genomic Databases

All of these
documents are
put into a PDF
file that you
print and get your
institution to sign
off on your behalf

bioinformatics.ca
Module 1: Cancer Genomic Databases

bioinformatics.ca
Module 1: Cancer Genomic Databases

bioinformatics.ca
Module 1: Cancer Genomic Databases

bioinformatics.ca
Module 1: Cancer Genomic Databases

bioinformatics.ca
Module 1: Cancer Genomic Databases

bioinformatics.ca
Module 1: Cancer Genomic Databases

bioinformatics.ca
DACO approved projects:
59 groups - 75% academic
(~400 people)

Module 1: Cancer Genomic Databases

bioinformatics.ca
DACO/DCC User Data Access Process
•

Users approved through DACO are now automatically granted access to
ICGC controlled access datasets available through the ICGC Data Portal
and the EBI’s EGA repository

user
accounts
activated

application
approved
by DACO

DACO Web
Application

DCC Data
Portal

DCC User
Registry
EBI EGA

84
Future Work for the DCC
•

Work with projects to improve in a number of areas:
– clinical data content,
– Increasing frequency of data release

•

Better metadata collection from the EGA
– Working with EGA to better match metadata requirements for ICGC member
submissions; will enable reliable linking by Sample ID, Donor ID, etc. between data
portal and EGA. Will allow direct link to DACO approved users
– Projects will be required to provide this required metadata at submission time,
existing EGA datasets will be updated.

•

Improve access to projects’ analysis methods
– Suggested publishing analysis SOPs in Standards in Genomic Sciences at most
recent ICGC workshop; haven’t seen any interest in doing this from member projects.
– DCC to host centralized web page(s) for each project’s analysis methods; use
permalink in submission files.

•
•
•

Better documentation … always need more!
Better transparency of processes
Better links to publications
85
85
Future Work for the DCC

• New releases:
– Release 15: finished before Christmas
• All data submission sent in again, plus new data
• (no methylation data)

– Release 16: incremental submission + Methylation data,
released before May
– Release 17: adopt incremental for all data types, and
increase frequency of releases.

86
86
New Project: ICGC PANCANCER analysis
• 2,000 Whole genome sequencing
–
–
–
–
–
–

6 cloud infrastructures across the world
Appropriate policy and tool availability
Agreed upon shared pipelines, and others
Shared datasets
Petabytes of files, 10,000’s cores
Mutation analysis, as well as CNV, Structural, others
when feasible (RNA and methylome).

87
Challenges and Opertunity
• Targetted sequencing for Patient
Selection
• Consent
• Combinations
• Corrected features and #features >>
#samples
• Noisy and incomplete data
• Speed and cost
We are also hiring!

Adapted from Paul Rejto, Pfizer

88
FGED’s mission:

To be a positive agent of
change in the effective
sharing and reproducibility
of functional genomic data

fged.org

89
Acknowledgments

http://oicr.on.ca

ICGC Project leaders
at the OICR:

Ouellette Lab

• FGED

Michelle Brazas
Emilie Chautard
Nina Palikuca
Matthew Ziembicki

Alvis Brazma
Roger Bumgarner
Cesare Furlanello
Michael Miller
Francis Ouellette
John Quackenbush –
Dana-Farber
Michael Reich
Gabriella Rustici
Chris Stoeckert
Ronald Taylor
Steve Trutane
Jennifer Weller
Brian Wilhelm
Neil Winegarden

•

Tom Hudson

•

John McPherson

•

Lincoln Stein

•

Paul Boutros

•

Lakshmi Mutsawarma

•

Vincent Ferretti

•

Francis Ouellette

•

Jennifer Jennings

DCC Software
Developer
Vincent Ferretti
Brian O’Connor
Junjun Zhang
Anthony Cros
Jonathan Guberman
Bob Tiernay
Shane Wilson
Long Yao
Daniel Chang
Jerry Lam
Stuart Watt

… and all the patients and their
families that that are putting their
hopes into our work!

http://icgc.org

Web Dev
Miyuki Fukuma
Kamen Wu
Joseph Yamada
Salman Badr
Pipeline Development
& Evaluation
Morgan Taschuk
Rob Denroche
Peter Ruzanov
Zhibin Lu
DCC Data Coordinator
Hardeep Nahal

90
Informatics and Biocomputing at the OICR

91
Maya et Pascale, 2012

92
http://icgc.org
info@icgc.org
This presentation: http://goo.gl/HP613K
Video tutorial: https://vimeo.com/75522669
93

More Related Content

What's hot

Free webinar-introduction to bioinformatics - biologist-1
Free webinar-introduction to bioinformatics - biologist-1Free webinar-introduction to bioinformatics - biologist-1
Free webinar-introduction to bioinformatics - biologist-1Elia Brodsky
 
Open data genomics_palermo_2017_ver03
Open data genomics_palermo_2017_ver03Open data genomics_palermo_2017_ver03
Open data genomics_palermo_2017_ver03Neuro, McGill University
 
Intro bioinformatics
Intro bioinformaticsIntro bioinformatics
Intro bioinformaticsChris Dwan
 
Genome in a bottle april 30 2015 hvp Leiden
Genome in a bottle april 30 2015 hvp LeidenGenome in a bottle april 30 2015 hvp Leiden
Genome in a bottle april 30 2015 hvp LeidenGenomeInABottle
 
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...Elia Brodsky
 
cBioPortal Webinar Slides (2/3)
cBioPortal Webinar Slides (2/3)cBioPortal Webinar Slides (2/3)
cBioPortal Webinar Slides (2/3)Pistoia Alliance
 
Bioinformatics - Discovering the Bio Logic Of Nature
Bioinformatics - Discovering the Bio Logic Of NatureBioinformatics - Discovering the Bio Logic Of Nature
Bioinformatics - Discovering the Bio Logic Of NatureRobert Cormia
 
Role of Bioinformatics in Cancer Research
Role of Bioinformatics in Cancer Research Role of Bioinformatics in Cancer Research
Role of Bioinformatics in Cancer Research Akash Arora
 
Louisiana Biomedical Research Network - Fall 2020 Bioinformatics Program Ove...
Louisiana Biomedical Research Network -  Fall 2020 Bioinformatics Program Ove...Louisiana Biomedical Research Network -  Fall 2020 Bioinformatics Program Ove...
Louisiana Biomedical Research Network - Fall 2020 Bioinformatics Program Ove...Elia Brodsky
 
Introduction to Bioinformatics
Introduction to BioinformaticsIntroduction to Bioinformatics
Introduction to BioinformaticsLeighton Pritchard
 
Brief introduction to Bioinformatics
Brief introduction to BioinformaticsBrief introduction to Bioinformatics
Brief introduction to BioinformaticsCynthia Alexander Rascon
 
Bioinformatics: What, Why and Where?
Bioinformatics: What, Why and Where?Bioinformatics: What, Why and Where?
Bioinformatics: What, Why and Where?Mohamed El Hadidi, Ph.D.
 
Introduction to bioinformatics
Introduction to bioinformaticsIntroduction to bioinformatics
Introduction to bioinformaticsphilmaweb
 
Bioinformatics, its application main
Bioinformatics, its application mainBioinformatics, its application main
Bioinformatics, its application mainKAUSHAL SAHU
 
Application of bioinformatics
Application of bioinformaticsApplication of bioinformatics
Application of bioinformaticsKamlesh Patade
 
Bioinformatics resources and search tools - report on summer training proj...
Bioinformatics   resources and search tools -  report on summer training proj...Bioinformatics   resources and search tools -  report on summer training proj...
Bioinformatics resources and search tools - report on summer training proj...Sapan Anand
 
Bioinformatics
BioinformaticsBioinformatics
BioinformaticsJTADrexel
 

What's hot (20)

Enriching Scholarship Personal Genomics presentation
Enriching Scholarship Personal Genomics presentationEnriching Scholarship Personal Genomics presentation
Enriching Scholarship Personal Genomics presentation
 
Free webinar-introduction to bioinformatics - biologist-1
Free webinar-introduction to bioinformatics - biologist-1Free webinar-introduction to bioinformatics - biologist-1
Free webinar-introduction to bioinformatics - biologist-1
 
Open data genomics_palermo_2017_ver03
Open data genomics_palermo_2017_ver03Open data genomics_palermo_2017_ver03
Open data genomics_palermo_2017_ver03
 
Intro bioinformatics
Intro bioinformaticsIntro bioinformatics
Intro bioinformatics
 
Genome in a bottle april 30 2015 hvp Leiden
Genome in a bottle april 30 2015 hvp LeidenGenome in a bottle april 30 2015 hvp Leiden
Genome in a bottle april 30 2015 hvp Leiden
 
JALANov2000
JALANov2000JALANov2000
JALANov2000
 
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
 
cBioPortal Webinar Slides (2/3)
cBioPortal Webinar Slides (2/3)cBioPortal Webinar Slides (2/3)
cBioPortal Webinar Slides (2/3)
 
Bioinformatics - Discovering the Bio Logic Of Nature
Bioinformatics - Discovering the Bio Logic Of NatureBioinformatics - Discovering the Bio Logic Of Nature
Bioinformatics - Discovering the Bio Logic Of Nature
 
Role of Bioinformatics in Cancer Research
Role of Bioinformatics in Cancer Research Role of Bioinformatics in Cancer Research
Role of Bioinformatics in Cancer Research
 
Louisiana Biomedical Research Network - Fall 2020 Bioinformatics Program Ove...
Louisiana Biomedical Research Network -  Fall 2020 Bioinformatics Program Ove...Louisiana Biomedical Research Network -  Fall 2020 Bioinformatics Program Ove...
Louisiana Biomedical Research Network - Fall 2020 Bioinformatics Program Ove...
 
Introduction to Bioinformatics
Introduction to BioinformaticsIntroduction to Bioinformatics
Introduction to Bioinformatics
 
Brief introduction to Bioinformatics
Brief introduction to BioinformaticsBrief introduction to Bioinformatics
Brief introduction to Bioinformatics
 
Bioinformatics: What, Why and Where?
Bioinformatics: What, Why and Where?Bioinformatics: What, Why and Where?
Bioinformatics: What, Why and Where?
 
Introduction to bioinformatics
Introduction to bioinformaticsIntroduction to bioinformatics
Introduction to bioinformatics
 
Bioinformatics, its application main
Bioinformatics, its application mainBioinformatics, its application main
Bioinformatics, its application main
 
Application of bioinformatics
Application of bioinformaticsApplication of bioinformatics
Application of bioinformatics
 
Bio Informatics
Bio InformaticsBio Informatics
Bio Informatics
 
Bioinformatics resources and search tools - report on summer training proj...
Bioinformatics   resources and search tools -  report on summer training proj...Bioinformatics   resources and search tools -  report on summer training proj...
Bioinformatics resources and search tools - report on summer training proj...
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 

Similar to ICGC Data Coordination Center (DCC) Overview

Using research software in a production environment
Using research software in a production environmentUsing research software in a production environment
Using research software in a production environmentMorgan Taschuk
 
Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...
Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...
Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...Barry Smith
 
NCI Cancer Genomics, Open Science and PMI: FAIR
NCI Cancer Genomics, Open Science and PMI: FAIR NCI Cancer Genomics, Open Science and PMI: FAIR
NCI Cancer Genomics, Open Science and PMI: FAIR Warren Kibbe
 
FDA NGS and Big Data Conference September 2014
FDA NGS and Big Data Conference September 2014FDA NGS and Big Data Conference September 2014
FDA NGS and Big Data Conference September 2014Warren Kibbe
 
Computational Pathology Workshop July 8 2014
Computational Pathology Workshop July 8 2014Computational Pathology Workshop July 8 2014
Computational Pathology Workshop July 8 2014Joel Saltz
 
Grand round whsiao_may2015
Grand round whsiao_may2015Grand round whsiao_may2015
Grand round whsiao_may2015IRIDA_community
 
How Can We Make Genomic Epidemiology a Widespread Reality? - William Hsiao
How Can We Make Genomic Epidemiology a Widespread Reality?  - William HsiaoHow Can We Make Genomic Epidemiology a Widespread Reality?  - William Hsiao
How Can We Make Genomic Epidemiology a Widespread Reality? - William HsiaoWilliam Hsiao
 
Data is the new oil: Big data, data mining and bio - inspiring techniques
Data is the new oil: Big data, data mining and bio - inspiring techniquesData is the new oil: Big data, data mining and bio - inspiring techniques
Data is the new oil: Big data, data mining and bio - inspiring techniquesAboul Ella Hassanien
 
Data are the new oil: Big data, data mining and bio - inspiring techniques
Data are the new oil: Big data, data mining and bio - inspiring techniquesData are the new oil: Big data, data mining and bio - inspiring techniques
Data are the new oil: Big data, data mining and bio - inspiring techniquesAboul Ella Hassanien
 
Data Commons & Data Science Workshop
Data Commons & Data Science WorkshopData Commons & Data Science Workshop
Data Commons & Data Science WorkshopWarren Kibbe
 
Life sciences big data use cases
Life sciences big data use casesLife sciences big data use cases
Life sciences big data use casesGuy Coates
 
EBI Industry programme TCGA Warren KIbbe November 2013
EBI Industry programme TCGA Warren KIbbe November 2013EBI Industry programme TCGA Warren KIbbe November 2013
EBI Industry programme TCGA Warren KIbbe November 2013Warren Kibbe
 
2016 09 cxo forum
2016 09 cxo forum2016 09 cxo forum
2016 09 cxo forumChris Dwan
 
Data-integration platform for cancer research:cBioPortal demo
Data-integration platform for cancer research:cBioPortal demoData-integration platform for cancer research:cBioPortal demo
Data-integration platform for cancer research:cBioPortal demoCORBEL
 
Will Biomedical Research Fundamentally Change in the Era of Big Data?
Will Biomedical Research Fundamentally Change in the Era of Big Data?Will Biomedical Research Fundamentally Change in the Era of Big Data?
Will Biomedical Research Fundamentally Change in the Era of Big Data?Philip Bourne
 
International perspective for sharing publicly funded medical research data
International perspective for sharing publicly funded medical research dataInternational perspective for sharing publicly funded medical research data
International perspective for sharing publicly funded medical research dataARDC
 
Ontology for the Financial Services Industry
Ontology for the Financial Services IndustryOntology for the Financial Services Industry
Ontology for the Financial Services IndustryBarry Smith
 
ICBO 2014, October 8, 2014
ICBO 2014, October 8, 2014ICBO 2014, October 8, 2014
ICBO 2014, October 8, 2014Warren Kibbe
 

Similar to ICGC Data Coordination Center (DCC) Overview (20)

Overview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data AnalysisOverview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data Analysis
 
Using research software in a production environment
Using research software in a production environmentUsing research software in a production environment
Using research software in a production environment
 
Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...
Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...
Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...
 
NCI Cancer Genomics, Open Science and PMI: FAIR
NCI Cancer Genomics, Open Science and PMI: FAIR NCI Cancer Genomics, Open Science and PMI: FAIR
NCI Cancer Genomics, Open Science and PMI: FAIR
 
FDA NGS and Big Data Conference September 2014
FDA NGS and Big Data Conference September 2014FDA NGS and Big Data Conference September 2014
FDA NGS and Big Data Conference September 2014
 
Computational Pathology Workshop July 8 2014
Computational Pathology Workshop July 8 2014Computational Pathology Workshop July 8 2014
Computational Pathology Workshop July 8 2014
 
Grand round whsiao_may2015
Grand round whsiao_may2015Grand round whsiao_may2015
Grand round whsiao_may2015
 
How Can We Make Genomic Epidemiology a Widespread Reality? - William Hsiao
How Can We Make Genomic Epidemiology a Widespread Reality?  - William HsiaoHow Can We Make Genomic Epidemiology a Widespread Reality?  - William Hsiao
How Can We Make Genomic Epidemiology a Widespread Reality? - William Hsiao
 
Data is the new oil: Big data, data mining and bio - inspiring techniques
Data is the new oil: Big data, data mining and bio - inspiring techniquesData is the new oil: Big data, data mining and bio - inspiring techniques
Data is the new oil: Big data, data mining and bio - inspiring techniques
 
Data are the new oil: Big data, data mining and bio - inspiring techniques
Data are the new oil: Big data, data mining and bio - inspiring techniquesData are the new oil: Big data, data mining and bio - inspiring techniques
Data are the new oil: Big data, data mining and bio - inspiring techniques
 
Data Commons & Data Science Workshop
Data Commons & Data Science WorkshopData Commons & Data Science Workshop
Data Commons & Data Science Workshop
 
Life sciences big data use cases
Life sciences big data use casesLife sciences big data use cases
Life sciences big data use cases
 
EBI Industry programme TCGA Warren KIbbe November 2013
EBI Industry programme TCGA Warren KIbbe November 2013EBI Industry programme TCGA Warren KIbbe November 2013
EBI Industry programme TCGA Warren KIbbe November 2013
 
2016 09 cxo forum
2016 09 cxo forum2016 09 cxo forum
2016 09 cxo forum
 
OpenTox Europe 2013
OpenTox Europe 2013OpenTox Europe 2013
OpenTox Europe 2013
 
Data-integration platform for cancer research:cBioPortal demo
Data-integration platform for cancer research:cBioPortal demoData-integration platform for cancer research:cBioPortal demo
Data-integration platform for cancer research:cBioPortal demo
 
Will Biomedical Research Fundamentally Change in the Era of Big Data?
Will Biomedical Research Fundamentally Change in the Era of Big Data?Will Biomedical Research Fundamentally Change in the Era of Big Data?
Will Biomedical Research Fundamentally Change in the Era of Big Data?
 
International perspective for sharing publicly funded medical research data
International perspective for sharing publicly funded medical research dataInternational perspective for sharing publicly funded medical research data
International perspective for sharing publicly funded medical research data
 
Ontology for the Financial Services Industry
Ontology for the Financial Services IndustryOntology for the Financial Services Industry
Ontology for the Financial Services Industry
 
ICBO 2014, October 8, 2014
ICBO 2014, October 8, 2014ICBO 2014, October 8, 2014
ICBO 2014, October 8, 2014
 

Recently uploaded

Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsKarinaGenton
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppCeline George
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 

Recently uploaded (20)

Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
CĂłdigo Creativo y Arte de Software | Unidad 1
CĂłdigo Creativo y Arte de Software | Unidad 1CĂłdigo Creativo y Arte de Software | Unidad 1
CĂłdigo Creativo y Arte de Software | Unidad 1
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its Characteristics
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website App
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 

ICGC Data Coordination Center (DCC) Overview

  • 1. The International Cancer Genome Consortium (ICGC) Data Coordinating Center (DCC) November 14th 2013 B.F. Francis Ouellette • • francis@oicr.on.ca Senior Scientists & Associate Director, Informatics and Biocomputing, Ontario Institute for Cancer Research, Toronto, ON Associate Professor, Department of Cell and Systems Biology, University of Toronto, Toronto, ON.
  • 2. Module #: Title of Module 2
  • 3. You are free to: Copy, share, adapt, or re-mix; Photograph, film, or broadcast; Blog, live-blog, or post video of; This presentation. Provided that: You attribute the work to its author and respect the rights and licenses associated with its components. Slide Concept by Cameron Neylon, who has waived all copyright and related or neighbouring rights. This slide only ccZero. Social Media Icons adapted with permission from originals by Christopher Ross. Original images are available under GPL at; http://www.thisismyurl.com/free-downloads/15-free-speech-bubble-icons-for-popular-websites 3
  • 4. Slides are on slideshare.net • http://www.slideshare.net/bffo/ebi-oncogenomics-nov2013ouellettever03 http://goo.gl/HP613K 4
  • 6. Disclaimer I do not (and will not) profit in any way, shape or form, from any of the brands, products or companies I may mention. 6
  • 7. 7
  • 8. Cancer therapy is like beating the dog with a stick to get rid of his fleas. - Anna Deavere Smith, Let me down easy 8
  • 10. The revolution in cancer research can summed up in a single sentence: cancer is in essence, a genetic disease. - Bert Vogelstein 10
  • 11. Cancer A Disease of the Genome Challenge in Treating Cancer:  Every tumor is different  Every cancer patient is different 11
  • 12. Large-Scale Studies of Cancer Genomes  Johns Hopkins > 18,000 genes analyzed for mutations 11 breast and 11 colon tumors L.D. Wood et al, Science, Oct. 2007  Wellcome Trust Sanger Institute 518 genes analyzed for mutations 210 tumors of various types C. Greenman et al, Nature, Mar. 2007  TCGA (NIH) Multiple technologies brain (glioblastoma multiforme), lung (squamous carcinoma), and ovarian (serous cystadenocarcinoma). F.S. Collins & A.D. Barker, Sci. Am, Mar. 2007 12
  • 13. Lessons learned  Heterogeneity within and across tumor types  High rate of abnormalities (driver vs passenger)  Sample quality matters 13
  • 14. 2007
  • 15. International Cancer Genome Consortium • Collect ~500 tumour/normal pairs from each of 50 different major cancer types; • Comprehensive genome analysis of each T/N pair: – – – – Genome Transcriptome Methylome Clinical data • Make the data available to the research community & public. Identify genome changes …GATTATTCCAGGTAT… …GATTATTGCAGGTAT… GCAGGTAT… …GATTATT 15
  • 16. Rationale for the ICGC • The scope is huge, such that no country can do it all. • Coordinated cancer genome initiatives will reduce duplication of effort for common and easy to acquire tumor samples and and ensure complete studies for many less frequent forms of cancer. • Standardization and uniform quality measures across studies will enable the merging of datasets, increasing power to detect additional targets. • The spectrum of many cancers varies across the world for many tumor types, because of environmental, genetic and other causes. • The ICGC will accelerate the dissemination of genomic and analytical methods across participating sites, and the user community 16
  • 17. International Cancer Genome Consortium (ICGC) Goals • Catalogue genomic abnormalities in tumors in 50 different cancer types and/or subtypes of clinical and societal importance across the globe • Generate complementary catalogues of transcriptomic and epigenomic datasets from the same tumors • Make the data available to research community rapidly with minimal restrictions to accelerate research into the causes and control of cancer 50 tumor types and/or subtypes 500 tumors + 500 controls per subtype 50,000 Human Genome Projects! Nature (2010) 464:993 17
  • 18. Analysis Data Types • • • • • • • • Simple Somatic Mutations Copy Number Alterations Structural Somatic Mutations Gene Expression (micro-arrays and RNASeq) miRNA Expression (RNASeq) Epigenomics (Arrays and Methylation) Splicing Variation Protein Expression 18
  • 19. 19
  • 20. OICR’s mission To build innovative research programs that will have an impact on the prevention, early detection, diagnosis and treatment of cancer. 20
  • 21. OICR Informatics & Biocomputing Senior Staff Lincoln Stein Director, I&B Sr. PI Vincent Ferretti Assoc. Director, Bioinf. Software Dev Sr. PI Francis Ouellette Assoc. Director, I&B Paul Boutros Jr. PI Lakshmi Muthuswamy Jr. PI David Sutton Director, IT Paul Shoichet BrianBoutros Jr. PI Sr. PI May 2013 Tatiana Lomasko Program Manager Jared Simpson OICR Fellow May 2013 21
  • 23. ICGC Map – November 2013 67 projects launched 23
  • 24. ICGC Committees & Working Groups http://icgc.org/icgc/committees-and-working-groups 24
  • 25. ICGC Project Teams @ OICR • ICGC Secretariat – Executive Chair: Thomas Hudson – Senior Project Manager: Jennifer Jennings – Administrative Coordinator: Jaypee Banlawi • (with the support of the Web Development team) • ICGC Data Coordination Center (DCC) – DCC Leader: Lincoln Stein – DCC Co-Leader: Francis Ouellette – DCC Software Development Team Leader: Vincent Ferretti (+6 FTE) – DCC Data Curation: Hardeep Nahal (+1 FTE) 25
  • 26. DCC Activities DCC activities are split between two groups: • Software Development – DCC portal – Submission tool • Curation (and Content Management) – – – – Data level management Submitter “handling” Coordination with secratariat User support http://dcc.icgc.org/team 26 26
  • 27. ICGC Data Coordination Centre A “comprehensive management system” providing: • • • • • • • Secure mechanism for uploading data Track uploads and perform integrity checks Regular progress reporting (data audit) Quality checks (coverage, correctness, etc.) Enable distribution of raw data to public repositories Provide essential metadata to public repositories Integrate with other public repositories via standard data formats, ontologies, etc. 27 27
  • 28. ICGC Data Coordination Centre (2) Provides the following support to experimental biologists, computational biologists, and other researchers: • • • • Download of complete dataset, or subsets Restrict protected data to authorized users (controlled access) Search data by gene or specimen, or lists thereof Interactive system for identifying specimens of interest, finding what data sets are available for those specimens, selecting data slices across those specimens (e.g., counts of the number of somatic mutations observed a region within the UTR of a gene of interest), and running basic analytic tests on those data slices 28 28
  • 29. ICGC Data Types • Clinical Data – Hosted by DCC via data portal – Was 100% open access, but currently 9 data elements have been flagged by DACO as controlled access and are under review by IDAC • Experimental Analysis Data – Hosted by DCC via data portal – Somatic is open access, germline is controlled • “Raw” Sequencing Data (+ array data, etc.) – Hosted at other public repositories – Primary repository for ICGC sequence data is EBI EGA – TCGA raw data hosted at CGhub 29
  • 30. Hardeep Nahal ICGC datasets to date ICGC Data Portal Cumulative Donor Count for Member Projects 10,000 Release 14 Release 11 Release 13 9000 Release 12 8000 Release 10 Release 9 7000 6000 Number of Donors 5000 Release 8 4000 Release 7 3000 2000 1000 Dec-11 Jan-2012 Feb March April May June July Aug Sept Oct Nov Dec Jan-2013 Feb March April May June July Aug Sept-2013 30
  • 31. ICGC dataset version 14 September 2013 Hardeep Nahal • Cancer types: 41 • Donors: 8,532 (18,056 specimens) • Simple somatic mutations: 1,995,134 • Copy number mutations: 18,526,593 • Structural rearrangements: 18,614 • Genes affected* by simple somatic mutations: 22,074 • Genes affected* by non-synonymous coding mutations: 19,150 Genes affected* by copy number mutations: 20,341 • Genes affected* by structural rearrangements: 1,884 • *out 22,259 protein coding genes annotated in Ensembl Human release 69 • Open tier and controlled data currently available
  • 32. Key DCC Activities for 2013 • Improved data & metadata curation at EGA; better linking of data held at DCC to ICGC data in other repositories (currently not perfect) • Improved data quality/integrity checking through new submission/validation system; review of submission file specifications • Integration of new data submission system and portal infrastructure with project and user information managed at ICGC.org 32
  • 34. Where do you find that information? • We actually make it hard to find, but we are working on that! (this is an example of where ICGC would like to do what TCGA does!) • http://cancergenome.nih.gov/publications/publicatio nguidelines 34
  • 35. Where do you find that information? For ICGC data: • Need to find the policy! • http://icgc.org/icgc/goals-structure-policiesguidelines/e3-publication-policy • Find text: • Published > no embargo • < 100 tumors > 2 years • > 100 tumors > 1 year • Find date: in README on FTP file • (exception in README) • This is bad, we know it, and we are fixing it! • In doubt? Contact us! info@icgc.org 35
  • 36. Time limits for publication moratoriums: All data shall become free of a publication moratorium when either: 1) the data is published by the ICGC member project 2) one year after a specified quantity of data (e.g. genome dataset from 100 tumours per project) has been released via the ICGC database or other public databases. 3) In all cases data shall be free of a publication moratorium two years after its initial release. 36
  • 39. Raw Data Availability at EGA by Project and Data Type • https://www.ebi.ac.uk/ega/organisations/EGAO00000000024 39
  • 40. Cooperation with EBI EGA Repository for Controlled Access Raw Data • Concerted efforts with EGA staff to support coordinated data submissions to both ICGC DCC & EGA • Infrastructure to grant controlled data access automatically on approval of ICGC DACO web application forms 40 40
  • 41. What the users see? • Important to have a data portal that represents the richness of the data that we generate, but to also make sure biologists and clinicians can actually use the data & make discoveries! • Important to have a scalable technology that will support 50,000 human genomes, and thousands of concurrent users (we don’t have that many yet) 41
  • 42. Uniform Annotations • Annotating Simple Somatic Mutations (SSM) and Simple Germline Variations (SGV) • DCC is currently implementing the snpEff software ◦ Recommended by the ICGC Bioinformatics Analysis Working Group ◦ Returns Sequence Ontology's controlled vocabulary regarding mutation-induced changes (www.sequenceontology.org) • ICGC members will not be required to annotate SSM and SGV for the ICGC data releases 42
  • 44. 44
  • 45. Select “Pancreatic cancer – Canada” 45
  • 46. … But where is the data? 46
  • 47. 47
  • 49. 49
  • 50. Highlights of the new portal: dcc.icgc.org • Faceted searches capabilities for variants, genes and donors – Interactive data exploration fast and easy • Mutation aggregation & counts across donors and cancers – # of pancreatic cancers donors with mutation KRAS G12D • • • • • Standardized gene consequence across all projects Genome browser Data doewnload Protein domains Links to repositories 50
  • 51. Technologies Chaplin Brian O’Connor/ Vincent Ferretti Web GUI Indexing Processing & Data Model Core 51
  • 52. 52
  • 54. • • • • • • • • Summary Cancer type distribution Other links (Cosmic, Entrez, etc) Mutation profile in protein Domains Genomic Context Mutation profile Most common mutations 54
  • 56. 56
  • 57. 57
  • 58. 58
  • 60. 60
  • 61. Donor • • • • • • • • • • • Donor ID Primary site Cancer Project Gender Tumor Stage Vital Status Disease Status Release type Age at diagnosis Available data types Analysis types 61
  • 66. 66
  • 67. 67
  • 69. Can do bulk download of the data … 69
  • 72. ICGC Data Categories ICGC Open Access Datasets ICGC Controlled Access Datasets  Cancer Pathology Histologic type or subtype Histologic nuclear grade  Donor Gender Age range  RNA expression (normalized)  DNA methylation  Genotype frequencies  Somatic mutations (SNV, CNV and Structural Rearrangement) Detailed Phenotype and Outcome Data Patient demography Risk factors Examination Surgery/Drugs/Radiation Sample/Slide Specific histological features Protocol Analyte/Aliquot Gene Expression (probe-level data) Raw genotype calls (germline) Gene-sample identifier links Genome sequence files Most of the data in the portal is publically available without restriction. However, access to some data, like the germline mutations, requires authorization by the Data Access Compliance Office (DACO) 72
  • 73. Module 1: Cancer Genomic Databases bioinformatics.ca
  • 74. http://icgc.org/daco Module 1: Cancer Genomic Databases bioinformatics.ca
  • 75. ICGC Controlled Access Datasets • Detailed Phenotype and Outcome data Region of residence Risk factors Examination Surgery Radiation Sample Slide Specific histological features Analyte Aliquot Donor notes • Gene Expression (probe-level data) • Raw genotype calls • Gene-sample identifier links • Genome sequence files ICGC OA Datasets • Cancer Pathology Histologic type or subtype Histologic nuclear grade • Patient/Person Gender, Age range, Vital status, Survival time Relapse type, Status at follow-up • Gene Expression (normalized) • DNA methylation •Computed Copy Number and Loss of Heterozygosity • Newly discovered somatic variants http://goo.gl/w4mrV 75
  • 76. Identify yourself Fill out detail form which includes: • Contact and Project Information •Information Technology details and procedures for keeping data secure •Data Access Agreement Module 1: Cancer Genomic Databases All of these documents are put into a PDF file that you print and get your institution to sign off on your behalf bioinformatics.ca
  • 77. Module 1: Cancer Genomic Databases bioinformatics.ca
  • 78. Module 1: Cancer Genomic Databases bioinformatics.ca
  • 79. Module 1: Cancer Genomic Databases bioinformatics.ca
  • 80. Module 1: Cancer Genomic Databases bioinformatics.ca
  • 81. Module 1: Cancer Genomic Databases bioinformatics.ca
  • 82. Module 1: Cancer Genomic Databases bioinformatics.ca
  • 83. DACO approved projects: 59 groups - 75% academic (~400 people) Module 1: Cancer Genomic Databases bioinformatics.ca
  • 84. DACO/DCC User Data Access Process • Users approved through DACO are now automatically granted access to ICGC controlled access datasets available through the ICGC Data Portal and the EBI’s EGA repository user accounts activated application approved by DACO DACO Web Application DCC Data Portal DCC User Registry EBI EGA 84
  • 85. Future Work for the DCC • Work with projects to improve in a number of areas: – clinical data content, – Increasing frequency of data release • Better metadata collection from the EGA – Working with EGA to better match metadata requirements for ICGC member submissions; will enable reliable linking by Sample ID, Donor ID, etc. between data portal and EGA. Will allow direct link to DACO approved users – Projects will be required to provide this required metadata at submission time, existing EGA datasets will be updated. • Improve access to projects’ analysis methods – Suggested publishing analysis SOPs in Standards in Genomic Sciences at most recent ICGC workshop; haven’t seen any interest in doing this from member projects. – DCC to host centralized web page(s) for each project’s analysis methods; use permalink in submission files. • • • Better documentation … always need more! Better transparency of processes Better links to publications 85 85
  • 86. Future Work for the DCC • New releases: – Release 15: finished before Christmas • All data submission sent in again, plus new data • (no methylation data) – Release 16: incremental submission + Methylation data, released before May – Release 17: adopt incremental for all data types, and increase frequency of releases. 86 86
  • 87. New Project: ICGC PANCANCER analysis • 2,000 Whole genome sequencing – – – – – – 6 cloud infrastructures across the world Appropriate policy and tool availability Agreed upon shared pipelines, and others Shared datasets Petabytes of files, 10,000’s cores Mutation analysis, as well as CNV, Structural, others when feasible (RNA and methylome). 87
  • 88. Challenges and Opertunity • Targetted sequencing for Patient Selection • Consent • Combinations • Corrected features and #features >> #samples • Noisy and incomplete data • Speed and cost We are also hiring! Adapted from Paul Rejto, Pfizer 88
  • 89. FGED’s mission: To be a positive agent of change in the effective sharing and reproducibility of functional genomic data fged.org 89
  • 90. Acknowledgments http://oicr.on.ca ICGC Project leaders at the OICR: Ouellette Lab • FGED Michelle Brazas Emilie Chautard Nina Palikuca Matthew Ziembicki Alvis Brazma Roger Bumgarner Cesare Furlanello Michael Miller Francis Ouellette John Quackenbush – Dana-Farber Michael Reich Gabriella Rustici Chris Stoeckert Ronald Taylor Steve Trutane Jennifer Weller Brian Wilhelm Neil Winegarden • Tom Hudson • John McPherson • Lincoln Stein • Paul Boutros • Lakshmi Mutsawarma • Vincent Ferretti • Francis Ouellette • Jennifer Jennings DCC Software Developer Vincent Ferretti Brian O’Connor Junjun Zhang Anthony Cros Jonathan Guberman Bob Tiernay Shane Wilson Long Yao Daniel Chang Jerry Lam Stuart Watt … and all the patients and their families that that are putting their hopes into our work! http://icgc.org Web Dev Miyuki Fukuma Kamen Wu Joseph Yamada Salman Badr Pipeline Development & Evaluation Morgan Taschuk Rob Denroche Peter Ruzanov Zhibin Lu DCC Data Coordinator Hardeep Nahal 90
  • 92. Maya et Pascale, 2012 92

Editor's Notes

  1. Good idea
  2. http://www.ocib.ca/bioinformatics.htmlIcgc.org
  3. 13/16 non-TCGA projects have data under ICGC DAC in EGA!NCC liver exome study (EGAS00001000389) is not currently associated with ICGC DAC