SlideShare a Scribd company logo
Open Data is Essential for
Personalized Medicine
BF Francis Ouellette
https://goo.gl/8U1QJa
Open Data is Essential for Genomics
This presentation is on:
https://www.slideshare.net/
3Module #: Title of Module
Open Data is Essential for Genomics
Open Data is Essential for Genomics
@bffo
francis@genomequebec.comE-mail
Open Data is Essential for Genomics
Times I’ve been in Italy
• Trieste 1996: Last Yeast Genome Meeting
• Naples 2005: NETTAB “Workflows management:
new abilities for the biological information overflow”
• Rome 2017: Elixir
• Palermo 2017: NETTAB
Open Data is Essential for Genomics
Outline
• What I do
• Open Data in genomics
• Final thoughts
Open Data is Essential for Genomics
But first, a little about me …
… an unfinished story!
Open Data is Essential for Genomics
https://goo.gl/anu933
Open Data is Essential for Genomics
http://goo.gl/dJIur
Open Data is Essential for Genomics
http://goo.gl/LwVOZ
Open Data is Essential for Genomics
http://goo.gl/QI6aL
Open Data is Essential for Genomics
http://goo.gl/mYHFO
Open Data is Essential for Genomics
http://goo.gl/Jc5TK
Open Data is Essential for Genomics
https://goo.gl/3PFr7L
1993-1997
Open Data is Essential for Genomics
from the National Centre for Biotechnology Information
Open Data is Essential for Genomics
from the National Centre for Biotechnology Information
Open Data is Essential for Genomics
from the National Centre for Biotechnology Information
PANIC
Open Data is Essential for Genomics
Open Data is Essential for Genomics
PANIC
Open Data is Essential for Genomics
PANIC
Open Data is Essential for Genomics
Open Data is Essential for Genomics
https://www.ubc.ca/
Open Data is Essential for Genomics
1999
Open Data is Essential for Genomics
2001: Human Genome Project
Open Data is Essential for Genomics
2003-2007
Open Data is Essential for Genomics
Open Data is Essential for Genomics
Toronto
Open Data is Essential for Genomics
2007-2017
Open Data is Essential for Genomics
International Cancer Genome Consortium
Open Data is Essential for Genomics
http://goo.gl/dJIur
Open Data is Essential for Genomics
2017- …
Open Data is Essential for Genomics
Open Data is Essential for Genomics
SABs, EBs & projects I’m on:
Open Data is Essential for Genomics
Open Data is Essential for Genomics
So what unifies all
of what I’ve done?
Open Data is Essential for Genomics
So what unifies all
of what I’ve done?
Helping scientists do science.
Open Data is Essential for Genomics
Open Data
https://goo.gl/Z63Wxp
Open Data is Essential for Genomics
Genomics
https://goo.gl/MX84KA
Open Data is Essential for Genomics
What am I calling “Genomics”?
All “omics”
– DNA and RNA, +Epigenomics
– Proteomics, +Protein Interactions, +Pathways
– Metabolomics
– Bioinformatics/Computational Biology
– All of the related data and metadata
• Phenotype
• Clinical
• Images
– New technologies …
Open Data is Essential for Genomics
Biological scope?
• Anything with DNA or RNA or protein
Open Data is Essential for Genomics
Open Data is Essential for Genomics
Example of one of a
challenge for all of us?
The integration of genomic data
with deep learning and artificial
intelligence
Open Data is Essential for Genomics
AI, Big Data, Deep Computing
• Artificial Intelligence / Deep Learning and
the Big Data Hype?
https://goo.gl/WHg36Q
Open Data is Essential for Genomics
What do we need for that?
https://goo.gl/JWpXj2
Open Data is Essential for Genomics
What do we need for that?
https://goo.gl/JWpXj2
Open Data is Essential for Genomics
What else?
• Data has to be FAIR
– TO BE FINDABLE
– TO BE ACCESSIBLE
– TO BE INTEROPERABLE
– TO BE RE-USABLE
• https://www.force11.org/group/fairgroup/fairprinciples
Open Data is Essential for Genomics
Big data examples
• Genomic sequences
• Imaging
• Population scale collected wearable data
Open Data is Essential for Genomics
Data Center for all in Québec?
• Health Care in Canada is governed
province by province.
• Génome Québec is working with various
ministries to set something that could be
useful/centralized and make genomic data
usable for research (controlled access).
• Needs to include clinical data
Open Data is Essential for Genomics
“Building a data centre is
like making pancakes, you
always need to throw
away the 1st one”
Robert Grossman
Frederick H. Rawson Professor and
the Director of the Center for Data
Intensive Science (CDIS) at the
University of Chicago
http://rgrossman.com/
Open Data is Essential for Genomics
Sharing all data types,
including clinical data?
https://goo.gl/ofEPeX
Open Data is Essential for Genomics
Authors present at the
“Toronto meeting”
https://goo.gl/ofEPeX
Open Data is Essential for Genomics
53 Introduction 1.0
Open data critical to
progress in Science
Open Data is Essential for Genomics
54 Introduction 1.0
One example: GenBank
GenBank sequence
database is an open
access, annotated
collection of all publicly
available nucleotide
sequences and their
protein translations.
Open Data is Essential for Genomics
55 Introduction 1.0
Open data critical to progress in Science
• Without GenBank and other public
sequence databases
– There would be no BLAST
– There would be no diagnostics DNA testing
– There would be no understanding of the
human genome (there probably would not
have been a human genome to work on in the
first place).
Open Data is Essential for Genomics
Adapted from Niko Beerenwinkel ,Chris D. Greenman ,Jens Lagergren
ICGC PCAWG
Docker
Testing
Computational Cancer Biology: An Evolutionary Perspective
•Published: February 4, 2016. https://doi.org/10.1371/journal.pcbi.1004717
Open Data is Essential for Genomics
Cancer is a Disease
of the Genome
Challenge in Treating Cancer:
 Every tumour is different
 Every cancer patient is different
Adapted from Tom Hudsonhttps://www.cancer.gov/research/areas/genomics
Open Data is Essential for Genomics
Analysis Data Types
• Simple Somatic Mutations (SSM or SNV)
• Copy Number Alterations (CAN or CNV)
• Structural Variants (SV)
• Germline variants (SNPs)
• Gene Expression (micro-arrays and RNASeq)
• miRNA Expression (RNASeq)
• Epigenomics (Arrays and Methylation)
• Splicing Variation (RNASeq)
• Protein Expression (Arrays)
Open Data is Essential for Genomics
International Cancer Genome Consortium
• Collect ~500 tumour/normal pairs from each of 50 different major
cancer types; 25,000 T/N pairs!
• Comprehensive genome analysis of each T/N pair:
– Genome
– Transcriptome
– Methylome
– Clinical data
• Make the data available to the research community & public.
Identify
genome
changes
…GATTATTCCAGGTAT… …GATTATTGCAGGTAT… …GATTATTGCAGGTAT…
Adapted from Tom Hudson
ONTARIO INSTITUTE FOR CANCER RESEARC
60
Open Data is Essential for Genomics
International Cancer Genome Consortium: http:/icgc.org
Open Data is Essential for Genomics
ICGC needs to deal with different
kinds of users!
62
• Biologists/Clinicians:
– Web interface to processed data, providing:
• Affected gene lists with consequences
• Impact on pathways
• Power users:
– Application Programing Interface (API) to get
to data
– Availability and Integration with cloud
resources
Open Data is Essential for Genomics
ICGC Data Coordinating Centre:
dcc.icgc.org
63
Open Data is Essential for Genomics
https://dcc.icgc.org/
64
Open Data is Essential for Genomics
65
https://dcc.icgc.org/icgc-in-the-cloud
Open Data is Essential for Genomics
66
http://www.cancercollaboratory.org/
Open Data is Essential for Genomics
Some challenges:
67
• So, we have lots of data, is
it generated the same way?
Open Data is Essential for Genomics
Every country/group has basically
been submitting:
68
– Simple Somatic Mutations (SSM or SNV)
– Copy Number Alterations (CAN or CNV)
– Structural Variants (SV)
– Germline variants (SNPs)
– Gene Expression (micro-arrays and RNASeq)
– miRNA Expression (RNASeq)
– Epigenomics (Arrays and Methylation)
– Splicing Variation (RNASeq)
– Protein Expression (Arrays)
Open Data is Essential for Genomics
Are they all using the same
pipelines?
69
• No
Open Data is Essential for Genomics
70
Open Data is Essential for Genomics
Steering Committee of PCAWG
71
• Peter Campbell, Sanger Inst.
• Gady Getz, Broad
• Jan Korbel, EMBL
• Lincoln Stein, OICR
• Josh Stuart, UCSC
Open Data is Essential for Genomics
PanCancer Analysis of Whole
Genomes (PCAWG)
• > 2,800 T/N pairs with clinical data from 20
tumour type of whole genome analysis.
• Aligned with one standard pipeline.
• Genomic Variants determined with 3 pipelines
• 17 working groups
• > 50 Papers are being
written now.
Open Data is Essential for Genomics
https://www.biorxiv.org/search/pcawg
Open Data is Essential for Genomics
Deliverable for PCAWG include:
74
• 1st PANCANCER analysis on > 2,800
cancer tumours from a WGS perspective
• RNA, SSM, CNV, Methylation analysis &
germline
• Published (executable) pipelines
– Docker / Dockstore
– Mutiple cloud access to data
– Multiple portal access to data
Open Data is Essential for Genomics
https://dcc.icgc.org/pcawg
75
Open Data is Essential for Genomics
Working Groups (1/2)
76
1. Novel somatic mutation calling methods
2. Analysis of mutations in regulatory regions
3. Integration of transcriptome and genome
4. Integration of epigenome and genome
5. Consequences of somatic mutations on pathway
and network activity
6. Patterns of structural variations, signatures,
genomic correlations, retrotransposons, mobile
elements
7. Mutation signatures and processes
8. Germline cancer genome
Open Data is Essential for Genomics
Working Groups (2/2)
77
9 Inferring driver mutations and identifying cancer
genes and pathways
10 Translating cancer genomes to the clinic
11 Evolution and heterogeneity
12 Exploratory: portals, visualization and software
infrastructure
13 Molecular subtypes and classification
14 Analysis of mutations in non-coding RNA
15 Exploratory: mitochondrial
16 Exploratory: pathogens
17 Tech Technical working group
Open Data is Essential for Genomics
https://goo.gl/AMxwSU
Open Data is Essential for Genomics
https://goo.gl/AMxwSU
Open Data is Essential for Genomics
https://goo.gl/AMxwSU
Open Data is Essential for Genomics
https://goo.gl/AMxwSU
Open Data is Essential for Genomics
http://dockstore.org
82
Open Data is Essential for Genomics
Docker Testing Group
• Group that to ensure all container
workflow work as expected.
https://goo.gl/AMxwSU
Open Data is Essential for Genomics
Access to Data?
• Human Data
• Patients consented to have their DNA
looked at so people could understand
cancer
• Need to have a system to maximize
people’s gift to science.
Open Data is Essential for Genomics
Open Data is Essential for Genomics
Identify
yourself
Fill out detail form which
includes:
• Contact and Project
Information
•Information Technology
details and procedures
for keeping data secure
•Data Access Agreement
All of these
documents are
put into a PDF
file that you
print and get your
institution to sign
off on your behalf
Open Data is Essential for Genomics
Open Data is Essential for Genomics
Open Data is Essential for Genomics
89
https://icgc.org/daco/approved-projects
314 groups
Open Data is Essential for Genomics
DACO
ICGC
dbGaP
GDC
EGA
TCGA
BAM
Open
Open
ERA
BA
M
BA
M
EGA id
& password
WGS
Ger m
Line
Open Data is Essential for Genomics
Challenge:
• Open Data and controlled access data
• Not enough eyeballs on the data
• Eyeballs on the data needed to make
discoveries.
https://goo.gl/ogbWXG
Open Data is Essential for Genomics
Culture of Sharing Openly
• Public Funding agencies
• Consortiums
• Mentors
• Peers
• New generation (vs my old generation)
• Has to become the norm
Open Data is Essential for Genomics
Final thoughts …
• Access to data is essential for science
• Getting data that is FAIR is hard work
• It is essential to share the work you do if
you want to be recognized, get tenure, get
a job or a promotion.
• Human data is more complicated, but
don’t let that get in the way!
• There is a lot of material out there, learn
from it (& cite your sources)!
Open Data is Essential for Genomics
Last message to students and
young PDFs and investigators:
Open Data is Essential for Genomics
Last message to students and
young PDFs and investigators:
Be open so people
can see how great
you are!
ONTARIO INSTITUTE FOR CANCER RESEARC
96
915
Open Data is Essential for Genomics
DCC Software
Developer
Vincent Ferretti
Dusan Andric
Phuong-My Do
Francois Gerthoffert
Terry Lin
Michael Moncada
Vitalii Slobodianyk
Bob Tiernay
Douglas Wong
Linda Xiang
Junjun Zhang
Acknowledgments
ICGC/OICR
Project leaders:
Tom Hudson
John McPherson
Lincoln Stein
Jared Simpson
Paul Boutros
Vincent Ferretti
Francis Ouellette
Jennifer Jennings
Ouellette Lab
Alysha Moncrieffe
Ann Meyer
Zhibin Lu
Web Dev
Joseph Yamada
Kaman Wu
Kim Cullion
Koji Miyauchi
Miyuki Fukuma
ICGC DCC Biocuration
Hardeep Nahal
Marc Perry
http://oicr.on.ca http://icgc.org
… and all the patients and their
families that that are putting
their hopes into our work!
Research
IT/Systems
David Sutton,
Bob Gibson
David Magda
Rob Naccarato
Brian Ott
Gino Yearwood
EGA
Jordi Rambla De
Argila
Arcadi Navarro
Audald Iloret
Mauricio Moldes
|
ÉQUIPE DES AFFAIRES SCIENTIFIQUES
9827 mars 2017
B.F. Francis
Ouellette
Annina Spilker
Joël Savard
Diana IglesiasDiane
Bouchard
Cristina CiurliMicheline
Ayoub
Hélène
Fournier
Open Data is Essential for Genomics
99
Grazie
Open Data is Essential for Genomics
100

More Related Content

What's hot

2015 bioinformatics personal_genomics_wim_vancriekinge
2015 bioinformatics personal_genomics_wim_vancriekinge2015 bioinformatics personal_genomics_wim_vancriekinge
2015 bioinformatics personal_genomics_wim_vancriekinge
Prof. Wim Van Criekinge
 
Introduction to Metagenomics. Applications, Approaches and Tools (Bioinformat...
Introduction to Metagenomics. Applications, Approaches and Tools (Bioinformat...Introduction to Metagenomics. Applications, Approaches and Tools (Bioinformat...
Introduction to Metagenomics. Applications, Approaches and Tools (Bioinformat...
VHIR Vall d’Hebron Institut de Recerca
 
Reference Data Integration: A Strategy for the Future
Reference Data Integration: A Strategy for the FutureReference Data Integration: A Strategy for the Future
Reference Data Integration: A Strategy for the Future
Barry Smith
 
Data analysis & integration challenges in genomics
Data analysis & integration challenges in genomicsData analysis & integration challenges in genomics
Data analysis & integration challenges in genomics
mikaelhuss
 
Michael Reich, GenomeSpace Workshop, fged_seattle_2013
Michael Reich, GenomeSpace Workshop, fged_seattle_2013Michael Reich, GenomeSpace Workshop, fged_seattle_2013
Michael Reich, GenomeSpace Workshop, fged_seattle_2013
Functional Genomics Data Society
 
NetBioSIG2013-Talk Thomas Kelder
NetBioSIG2013-Talk Thomas KelderNetBioSIG2013-Talk Thomas Kelder
NetBioSIG2013-Talk Thomas Kelder
Alexander Pico
 
Next generation sequencing in preimplantation genetic screening (NGS in PGS)
Next generation sequencing in preimplantation genetic screening (NGS in PGS)Next generation sequencing in preimplantation genetic screening (NGS in PGS)
Next generation sequencing in preimplantation genetic screening (NGS in PGS)
Mahidol University, Thailand
 
Pathology is being disrupted by Data Integration, AI & Blockchain
Pathology is being disrupted by Data Integration, AI & BlockchainPathology is being disrupted by Data Integration, AI & Blockchain
Pathology is being disrupted by Data Integration, AI & Blockchain
Natalio Krasnogor
 
Proposal for 2016 survey of WGS capacity in EU/EEA Member States
Proposal for 2016 survey of WGS capacity in EU/EEA Member StatesProposal for 2016 survey of WGS capacity in EU/EEA Member States
Proposal for 2016 survey of WGS capacity in EU/EEA Member States
European Center for Disease Prevention and Control (ECDC)
 
BIOINFORMATICS Applications And Challenges
BIOINFORMATICS Applications And ChallengesBIOINFORMATICS Applications And Challenges
BIOINFORMATICS Applications And Challenges
Amos Watentena
 
Building bioinformatics resources for the global community
Building bioinformatics resources for the global communityBuilding bioinformatics resources for the global community
Building bioinformatics resources for the global community
ExternalEvents
 
Data for AI models, the past, the present, the future
Data for AI models, the past, the present, the futureData for AI models, the past, the present, the future
Data for AI models, the past, the present, the future
Pistoia Alliance
 
WGS in public health microbiology - MDU/VIDRL Seminar - wed 17 jun 2015
WGS in public health microbiology - MDU/VIDRL Seminar - wed 17 jun 2015WGS in public health microbiology - MDU/VIDRL Seminar - wed 17 jun 2015
WGS in public health microbiology - MDU/VIDRL Seminar - wed 17 jun 2015
Torsten Seemann
 
Career oppurtunities in the field of Bioinformatics
Career oppurtunities in the field of BioinformaticsCareer oppurtunities in the field of Bioinformatics
Career oppurtunities in the field of Bioinformatics
Shikha Thakur
 
Pallavi online assignment
Pallavi online assignmentPallavi online assignment
Pallavi online assignment
reshmafmtc
 
Bioinformatics lecture 1
Bioinformatics lecture 1Bioinformatics lecture 1
Bioinformatics lecture 1
Hamid Ur-Rahman
 
Introduction to Cancer Genomics Databases
Introduction to Cancer Genomics DatabasesIntroduction to Cancer Genomics Databases
Introduction to Cancer Genomics Databases
Neuro, McGill University
 
Basics of Data Analysis in Bioinformatics
Basics of Data Analysis in BioinformaticsBasics of Data Analysis in Bioinformatics
Basics of Data Analysis in Bioinformatics
Elena Sügis
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
Nuno Barreto
 
NetBioSIG2013-Talk Robin Haw
NetBioSIG2013-Talk Robin Haw NetBioSIG2013-Talk Robin Haw
NetBioSIG2013-Talk Robin Haw
Alexander Pico
 

What's hot (20)

2015 bioinformatics personal_genomics_wim_vancriekinge
2015 bioinformatics personal_genomics_wim_vancriekinge2015 bioinformatics personal_genomics_wim_vancriekinge
2015 bioinformatics personal_genomics_wim_vancriekinge
 
Introduction to Metagenomics. Applications, Approaches and Tools (Bioinformat...
Introduction to Metagenomics. Applications, Approaches and Tools (Bioinformat...Introduction to Metagenomics. Applications, Approaches and Tools (Bioinformat...
Introduction to Metagenomics. Applications, Approaches and Tools (Bioinformat...
 
Reference Data Integration: A Strategy for the Future
Reference Data Integration: A Strategy for the FutureReference Data Integration: A Strategy for the Future
Reference Data Integration: A Strategy for the Future
 
Data analysis & integration challenges in genomics
Data analysis & integration challenges in genomicsData analysis & integration challenges in genomics
Data analysis & integration challenges in genomics
 
Michael Reich, GenomeSpace Workshop, fged_seattle_2013
Michael Reich, GenomeSpace Workshop, fged_seattle_2013Michael Reich, GenomeSpace Workshop, fged_seattle_2013
Michael Reich, GenomeSpace Workshop, fged_seattle_2013
 
NetBioSIG2013-Talk Thomas Kelder
NetBioSIG2013-Talk Thomas KelderNetBioSIG2013-Talk Thomas Kelder
NetBioSIG2013-Talk Thomas Kelder
 
Next generation sequencing in preimplantation genetic screening (NGS in PGS)
Next generation sequencing in preimplantation genetic screening (NGS in PGS)Next generation sequencing in preimplantation genetic screening (NGS in PGS)
Next generation sequencing in preimplantation genetic screening (NGS in PGS)
 
Pathology is being disrupted by Data Integration, AI & Blockchain
Pathology is being disrupted by Data Integration, AI & BlockchainPathology is being disrupted by Data Integration, AI & Blockchain
Pathology is being disrupted by Data Integration, AI & Blockchain
 
Proposal for 2016 survey of WGS capacity in EU/EEA Member States
Proposal for 2016 survey of WGS capacity in EU/EEA Member StatesProposal for 2016 survey of WGS capacity in EU/EEA Member States
Proposal for 2016 survey of WGS capacity in EU/EEA Member States
 
BIOINFORMATICS Applications And Challenges
BIOINFORMATICS Applications And ChallengesBIOINFORMATICS Applications And Challenges
BIOINFORMATICS Applications And Challenges
 
Building bioinformatics resources for the global community
Building bioinformatics resources for the global communityBuilding bioinformatics resources for the global community
Building bioinformatics resources for the global community
 
Data for AI models, the past, the present, the future
Data for AI models, the past, the present, the futureData for AI models, the past, the present, the future
Data for AI models, the past, the present, the future
 
WGS in public health microbiology - MDU/VIDRL Seminar - wed 17 jun 2015
WGS in public health microbiology - MDU/VIDRL Seminar - wed 17 jun 2015WGS in public health microbiology - MDU/VIDRL Seminar - wed 17 jun 2015
WGS in public health microbiology - MDU/VIDRL Seminar - wed 17 jun 2015
 
Career oppurtunities in the field of Bioinformatics
Career oppurtunities in the field of BioinformaticsCareer oppurtunities in the field of Bioinformatics
Career oppurtunities in the field of Bioinformatics
 
Pallavi online assignment
Pallavi online assignmentPallavi online assignment
Pallavi online assignment
 
Bioinformatics lecture 1
Bioinformatics lecture 1Bioinformatics lecture 1
Bioinformatics lecture 1
 
Introduction to Cancer Genomics Databases
Introduction to Cancer Genomics DatabasesIntroduction to Cancer Genomics Databases
Introduction to Cancer Genomics Databases
 
Basics of Data Analysis in Bioinformatics
Basics of Data Analysis in BioinformaticsBasics of Data Analysis in Bioinformatics
Basics of Data Analysis in Bioinformatics
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
NetBioSIG2013-Talk Robin Haw
NetBioSIG2013-Talk Robin Haw NetBioSIG2013-Talk Robin Haw
NetBioSIG2013-Talk Robin Haw
 

Similar to Open data genomics_palermo_2017_ver03

Grand round whsiao_may2015
Grand round whsiao_may2015Grand round whsiao_may2015
Grand round whsiao_may2015
IRIDA_community
 
How Can We Make Genomic Epidemiology a Widespread Reality? - William Hsiao
How Can We Make Genomic Epidemiology a Widespread Reality?  - William HsiaoHow Can We Make Genomic Epidemiology a Widespread Reality?  - William Hsiao
How Can We Make Genomic Epidemiology a Widespread Reality? - William Hsiao
William Hsiao
 
Nov 2014 ouellette_windsor_icgc_final
Nov 2014 ouellette_windsor_icgc_finalNov 2014 ouellette_windsor_icgc_final
Nov 2014 ouellette_windsor_icgc_final
Neuro, McGill University
 
Bioinformatics Introduction
Bioinformatics IntroductionBioinformatics Introduction
Bioinformatics Introduction
David Montaner
 
NGS and the molecular basis of disease: a practical view
NGS and the molecular basis of disease: a practical viewNGS and the molecular basis of disease: a practical view
NGS and the molecular basis of disease: a practical view
Vall d'Hebron Institute of Research (VHIR)
 
Biocuration activities for the International Cancer Genome Consortium (ICGC).
Biocuration activities for the International Cancer Genome Consortium (ICGC).Biocuration activities for the International Cancer Genome Consortium (ICGC).
Biocuration activities for the International Cancer Genome Consortium (ICGC).
Neuro, McGill University
 
2015 04 22_time_labs_shared
2015 04 22_time_labs_shared2015 04 22_time_labs_shared
2015 04 22_time_labs_shared
Prof. Wim Van Criekinge
 
tranSMART Community Meeting 5-7 Nov 13 - Session 3: The TraIT user stories fo...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: The TraIT user stories fo...tranSMART Community Meeting 5-7 Nov 13 - Session 3: The TraIT user stories fo...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: The TraIT user stories fo...
David Peyruc
 
Will Biomedical Research Fundamentally Change in the Era of Big Data?
Will Biomedical Research Fundamentally Change in the Era of Big Data?Will Biomedical Research Fundamentally Change in the Era of Big Data?
Will Biomedical Research Fundamentally Change in the Era of Big Data?
Philip Bourne
 
Ontology for the Financial Services Industry
Ontology for the Financial Services IndustryOntology for the Financial Services Industry
Ontology for the Financial Services IndustryBarry Smith
 
Workshop finding and accessing data - fiona - lunteren april 18 2016
Workshop   finding and accessing data - fiona - lunteren april 18 2016Workshop   finding and accessing data - fiona - lunteren april 18 2016
Workshop finding and accessing data - fiona - lunteren april 18 2016
Fiona Nielsen
 
KnetMiner - EBI Workshop 2017
KnetMiner - EBI Workshop 2017KnetMiner - EBI Workshop 2017
KnetMiner - EBI Workshop 2017
Keywan Hassani-Pak
 
Use of data
Use of dataUse of data
Use of data
Chris Evelo
 
Advanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven ResearchAdvanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven Research
European Bioinformatics Institute
 
Amia tb-review-08
Amia tb-review-08Amia tb-review-08
Amia tb-review-08
Russ Altman
 
bioinfomatics
bioinfomaticsbioinfomatics
bioinfomatics
nguyenpg
 
PhenoMeNal: Large scale computing with medical metabolic phenotyping data
PhenoMeNal: Large scale computing with medical metabolic phenotyping dataPhenoMeNal: Large scale computing with medical metabolic phenotyping data
PhenoMeNal: Large scale computing with medical metabolic phenotyping data
Christoph Steinbeck
 
Life sciences big data use cases
Life sciences big data use casesLife sciences big data use cases
Life sciences big data use cases
Guy Coates
 
The Big Picture: The Industrial Revolutiona talk in berlin, 2008, about indus...
The Big Picture: The Industrial Revolutiona talk in berlin, 2008, about indus...The Big Picture: The Industrial Revolutiona talk in berlin, 2008, about indus...
The Big Picture: The Industrial Revolutiona talk in berlin, 2008, about indus...
robertstevens65
 

Similar to Open data genomics_palermo_2017_ver03 (20)

Grand round whsiao_may2015
Grand round whsiao_may2015Grand round whsiao_may2015
Grand round whsiao_may2015
 
How Can We Make Genomic Epidemiology a Widespread Reality? - William Hsiao
How Can We Make Genomic Epidemiology a Widespread Reality?  - William HsiaoHow Can We Make Genomic Epidemiology a Widespread Reality?  - William Hsiao
How Can We Make Genomic Epidemiology a Widespread Reality? - William Hsiao
 
Nov 2014 ouellette_windsor_icgc_final
Nov 2014 ouellette_windsor_icgc_finalNov 2014 ouellette_windsor_icgc_final
Nov 2014 ouellette_windsor_icgc_final
 
Bioinformatics Introduction
Bioinformatics IntroductionBioinformatics Introduction
Bioinformatics Introduction
 
NGS and the molecular basis of disease: a practical view
NGS and the molecular basis of disease: a practical viewNGS and the molecular basis of disease: a practical view
NGS and the molecular basis of disease: a practical view
 
JALANov2000
JALANov2000JALANov2000
JALANov2000
 
Biocuration activities for the International Cancer Genome Consortium (ICGC).
Biocuration activities for the International Cancer Genome Consortium (ICGC).Biocuration activities for the International Cancer Genome Consortium (ICGC).
Biocuration activities for the International Cancer Genome Consortium (ICGC).
 
2015 04 22_time_labs_shared
2015 04 22_time_labs_shared2015 04 22_time_labs_shared
2015 04 22_time_labs_shared
 
tranSMART Community Meeting 5-7 Nov 13 - Session 3: The TraIT user stories fo...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: The TraIT user stories fo...tranSMART Community Meeting 5-7 Nov 13 - Session 3: The TraIT user stories fo...
tranSMART Community Meeting 5-7 Nov 13 - Session 3: The TraIT user stories fo...
 
Will Biomedical Research Fundamentally Change in the Era of Big Data?
Will Biomedical Research Fundamentally Change in the Era of Big Data?Will Biomedical Research Fundamentally Change in the Era of Big Data?
Will Biomedical Research Fundamentally Change in the Era of Big Data?
 
Ontology for the Financial Services Industry
Ontology for the Financial Services IndustryOntology for the Financial Services Industry
Ontology for the Financial Services Industry
 
Workshop finding and accessing data - fiona - lunteren april 18 2016
Workshop   finding and accessing data - fiona - lunteren april 18 2016Workshop   finding and accessing data - fiona - lunteren april 18 2016
Workshop finding and accessing data - fiona - lunteren april 18 2016
 
KnetMiner - EBI Workshop 2017
KnetMiner - EBI Workshop 2017KnetMiner - EBI Workshop 2017
KnetMiner - EBI Workshop 2017
 
Use of data
Use of dataUse of data
Use of data
 
Advanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven ResearchAdvanced Bioinformatics for Genomics and BioData Driven Research
Advanced Bioinformatics for Genomics and BioData Driven Research
 
Amia tb-review-08
Amia tb-review-08Amia tb-review-08
Amia tb-review-08
 
bioinfomatics
bioinfomaticsbioinfomatics
bioinfomatics
 
PhenoMeNal: Large scale computing with medical metabolic phenotyping data
PhenoMeNal: Large scale computing with medical metabolic phenotyping dataPhenoMeNal: Large scale computing with medical metabolic phenotyping data
PhenoMeNal: Large scale computing with medical metabolic phenotyping data
 
Life sciences big data use cases
Life sciences big data use casesLife sciences big data use cases
Life sciences big data use cases
 
The Big Picture: The Industrial Revolutiona talk in berlin, 2008, about indus...
The Big Picture: The Industrial Revolutiona talk in berlin, 2008, about indus...The Big Picture: The Industrial Revolutiona talk in berlin, 2008, about indus...
The Big Picture: The Industrial Revolutiona talk in berlin, 2008, about indus...
 

Recently uploaded

erythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptxerythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptx
muralinath2
 
Hemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptxHemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptx
muralinath2
 
Cancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate PathwayCancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate Pathway
AADYARAJPANDEY1
 
general properties of oerganologametal.ppt
general properties of oerganologametal.pptgeneral properties of oerganologametal.ppt
general properties of oerganologametal.ppt
IqrimaNabilatulhusni
 
Penicillin...........................pptx
Penicillin...........................pptxPenicillin...........................pptx
Penicillin...........................pptx
Cherry
 
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
NathanBaughman3
 
Structures and textures of metamorphic rocks
Structures and textures of metamorphic rocksStructures and textures of metamorphic rocks
Structures and textures of metamorphic rocks
kumarmathi863
 
Anemia_ different types_causes_ conditions
Anemia_ different types_causes_ conditionsAnemia_ different types_causes_ conditions
Anemia_ different types_causes_ conditions
muralinath2
 
insect morphology and physiology of insect
insect morphology and physiology of insectinsect morphology and physiology of insect
insect morphology and physiology of insect
anitaento25
 
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
Health Advances
 
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
AlaminAfendy1
 
Viksit bharat till 2047 India@2047.pptx
Viksit bharat till 2047  India@2047.pptxViksit bharat till 2047  India@2047.pptx
Viksit bharat till 2047 India@2047.pptx
rakeshsharma20142015
 
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Sérgio Sacani
 
filosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptxfilosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptx
IvanMallco1
 
Lab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerinLab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerin
ossaicprecious19
 
Richard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlandsRichard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlands
Richard Gill
 
FAIR & AI Ready KGs for Explainable Predictions
FAIR & AI Ready KGs for Explainable PredictionsFAIR & AI Ready KGs for Explainable Predictions
FAIR & AI Ready KGs for Explainable Predictions
Michel Dumontier
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
ChetanK57
 
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptxBody fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
muralinath2
 
Comparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebratesComparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebrates
sachin783648
 

Recently uploaded (20)

erythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptxerythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptx
 
Hemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptxHemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptx
 
Cancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate PathwayCancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate Pathway
 
general properties of oerganologametal.ppt
general properties of oerganologametal.pptgeneral properties of oerganologametal.ppt
general properties of oerganologametal.ppt
 
Penicillin...........................pptx
Penicillin...........................pptxPenicillin...........................pptx
Penicillin...........................pptx
 
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
 
Structures and textures of metamorphic rocks
Structures and textures of metamorphic rocksStructures and textures of metamorphic rocks
Structures and textures of metamorphic rocks
 
Anemia_ different types_causes_ conditions
Anemia_ different types_causes_ conditionsAnemia_ different types_causes_ conditions
Anemia_ different types_causes_ conditions
 
insect morphology and physiology of insect
insect morphology and physiology of insectinsect morphology and physiology of insect
insect morphology and physiology of insect
 
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
 
In silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptxIn silico drugs analogue design: novobiocin analogues.pptx
In silico drugs analogue design: novobiocin analogues.pptx
 
Viksit bharat till 2047 India@2047.pptx
Viksit bharat till 2047  India@2047.pptxViksit bharat till 2047  India@2047.pptx
Viksit bharat till 2047 India@2047.pptx
 
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
 
filosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptxfilosofia boliviana introducción jsjdjd.pptx
filosofia boliviana introducción jsjdjd.pptx
 
Lab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerinLab report on liquid viscosity of glycerin
Lab report on liquid viscosity of glycerin
 
Richard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlandsRichard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlands
 
FAIR & AI Ready KGs for Explainable Predictions
FAIR & AI Ready KGs for Explainable PredictionsFAIR & AI Ready KGs for Explainable Predictions
FAIR & AI Ready KGs for Explainable Predictions
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
 
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptxBody fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
Body fluids_tonicity_dehydration_hypovolemia_hypervolemia.pptx
 
Comparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebratesComparative structure of adrenal gland in vertebrates
Comparative structure of adrenal gland in vertebrates
 

Open data genomics_palermo_2017_ver03

  • 1. Open Data is Essential for Personalized Medicine BF Francis Ouellette https://goo.gl/8U1QJa
  • 2. Open Data is Essential for Genomics This presentation is on: https://www.slideshare.net/
  • 3. 3Module #: Title of Module
  • 4. Open Data is Essential for Genomics
  • 5. Open Data is Essential for Genomics @bffo francis@genomequebec.comE-mail
  • 6. Open Data is Essential for Genomics Times I’ve been in Italy • Trieste 1996: Last Yeast Genome Meeting • Naples 2005: NETTAB “Workflows management: new abilities for the biological information overflow” • Rome 2017: Elixir • Palermo 2017: NETTAB
  • 7. Open Data is Essential for Genomics Outline • What I do • Open Data in genomics • Final thoughts
  • 8. Open Data is Essential for Genomics But first, a little about me … … an unfinished story!
  • 9. Open Data is Essential for Genomics https://goo.gl/anu933
  • 10. Open Data is Essential for Genomics http://goo.gl/dJIur
  • 11. Open Data is Essential for Genomics http://goo.gl/LwVOZ
  • 12. Open Data is Essential for Genomics http://goo.gl/QI6aL
  • 13. Open Data is Essential for Genomics http://goo.gl/mYHFO
  • 14. Open Data is Essential for Genomics http://goo.gl/Jc5TK
  • 15. Open Data is Essential for Genomics https://goo.gl/3PFr7L 1993-1997
  • 16. Open Data is Essential for Genomics from the National Centre for Biotechnology Information
  • 17. Open Data is Essential for Genomics from the National Centre for Biotechnology Information
  • 18. Open Data is Essential for Genomics from the National Centre for Biotechnology Information PANIC
  • 19. Open Data is Essential for Genomics
  • 20. Open Data is Essential for Genomics PANIC
  • 21. Open Data is Essential for Genomics PANIC
  • 22. Open Data is Essential for Genomics
  • 23. Open Data is Essential for Genomics https://www.ubc.ca/
  • 24. Open Data is Essential for Genomics 1999
  • 25. Open Data is Essential for Genomics 2001: Human Genome Project
  • 26. Open Data is Essential for Genomics 2003-2007
  • 27. Open Data is Essential for Genomics
  • 28. Open Data is Essential for Genomics Toronto
  • 29. Open Data is Essential for Genomics 2007-2017
  • 30. Open Data is Essential for Genomics International Cancer Genome Consortium
  • 31. Open Data is Essential for Genomics http://goo.gl/dJIur
  • 32. Open Data is Essential for Genomics 2017- …
  • 33. Open Data is Essential for Genomics
  • 34. Open Data is Essential for Genomics SABs, EBs & projects I’m on:
  • 35. Open Data is Essential for Genomics
  • 36. Open Data is Essential for Genomics So what unifies all of what I’ve done?
  • 37. Open Data is Essential for Genomics So what unifies all of what I’ve done? Helping scientists do science.
  • 38. Open Data is Essential for Genomics Open Data https://goo.gl/Z63Wxp
  • 39. Open Data is Essential for Genomics Genomics https://goo.gl/MX84KA
  • 40. Open Data is Essential for Genomics What am I calling “Genomics”? All “omics” – DNA and RNA, +Epigenomics – Proteomics, +Protein Interactions, +Pathways – Metabolomics – Bioinformatics/Computational Biology – All of the related data and metadata • Phenotype • Clinical • Images – New technologies …
  • 41. Open Data is Essential for Genomics Biological scope? • Anything with DNA or RNA or protein
  • 42. Open Data is Essential for Genomics
  • 43. Open Data is Essential for Genomics Example of one of a challenge for all of us? The integration of genomic data with deep learning and artificial intelligence
  • 44. Open Data is Essential for Genomics AI, Big Data, Deep Computing • Artificial Intelligence / Deep Learning and the Big Data Hype? https://goo.gl/WHg36Q
  • 45. Open Data is Essential for Genomics What do we need for that? https://goo.gl/JWpXj2
  • 46. Open Data is Essential for Genomics What do we need for that? https://goo.gl/JWpXj2
  • 47. Open Data is Essential for Genomics What else? • Data has to be FAIR – TO BE FINDABLE – TO BE ACCESSIBLE – TO BE INTEROPERABLE – TO BE RE-USABLE • https://www.force11.org/group/fairgroup/fairprinciples
  • 48. Open Data is Essential for Genomics Big data examples • Genomic sequences • Imaging • Population scale collected wearable data
  • 49. Open Data is Essential for Genomics Data Center for all in Québec? • Health Care in Canada is governed province by province. • Génome Québec is working with various ministries to set something that could be useful/centralized and make genomic data usable for research (controlled access). • Needs to include clinical data
  • 50. Open Data is Essential for Genomics “Building a data centre is like making pancakes, you always need to throw away the 1st one” Robert Grossman Frederick H. Rawson Professor and the Director of the Center for Data Intensive Science (CDIS) at the University of Chicago http://rgrossman.com/
  • 51. Open Data is Essential for Genomics Sharing all data types, including clinical data? https://goo.gl/ofEPeX
  • 52. Open Data is Essential for Genomics Authors present at the “Toronto meeting” https://goo.gl/ofEPeX
  • 53. Open Data is Essential for Genomics 53 Introduction 1.0 Open data critical to progress in Science
  • 54. Open Data is Essential for Genomics 54 Introduction 1.0 One example: GenBank GenBank sequence database is an open access, annotated collection of all publicly available nucleotide sequences and their protein translations.
  • 55. Open Data is Essential for Genomics 55 Introduction 1.0 Open data critical to progress in Science • Without GenBank and other public sequence databases – There would be no BLAST – There would be no diagnostics DNA testing – There would be no understanding of the human genome (there probably would not have been a human genome to work on in the first place).
  • 56. Open Data is Essential for Genomics Adapted from Niko Beerenwinkel ,Chris D. Greenman ,Jens Lagergren ICGC PCAWG Docker Testing Computational Cancer Biology: An Evolutionary Perspective •Published: February 4, 2016. https://doi.org/10.1371/journal.pcbi.1004717
  • 57. Open Data is Essential for Genomics Cancer is a Disease of the Genome Challenge in Treating Cancer:  Every tumour is different  Every cancer patient is different Adapted from Tom Hudsonhttps://www.cancer.gov/research/areas/genomics
  • 58. Open Data is Essential for Genomics Analysis Data Types • Simple Somatic Mutations (SSM or SNV) • Copy Number Alterations (CAN or CNV) • Structural Variants (SV) • Germline variants (SNPs) • Gene Expression (micro-arrays and RNASeq) • miRNA Expression (RNASeq) • Epigenomics (Arrays and Methylation) • Splicing Variation (RNASeq) • Protein Expression (Arrays)
  • 59. Open Data is Essential for Genomics International Cancer Genome Consortium • Collect ~500 tumour/normal pairs from each of 50 different major cancer types; 25,000 T/N pairs! • Comprehensive genome analysis of each T/N pair: – Genome – Transcriptome – Methylome – Clinical data • Make the data available to the research community & public. Identify genome changes …GATTATTCCAGGTAT… …GATTATTGCAGGTAT… …GATTATTGCAGGTAT… Adapted from Tom Hudson
  • 60. ONTARIO INSTITUTE FOR CANCER RESEARC 60
  • 61. Open Data is Essential for Genomics International Cancer Genome Consortium: http:/icgc.org
  • 62. Open Data is Essential for Genomics ICGC needs to deal with different kinds of users! 62 • Biologists/Clinicians: – Web interface to processed data, providing: • Affected gene lists with consequences • Impact on pathways • Power users: – Application Programing Interface (API) to get to data – Availability and Integration with cloud resources
  • 63. Open Data is Essential for Genomics ICGC Data Coordinating Centre: dcc.icgc.org 63
  • 64. Open Data is Essential for Genomics https://dcc.icgc.org/ 64
  • 65. Open Data is Essential for Genomics 65 https://dcc.icgc.org/icgc-in-the-cloud
  • 66. Open Data is Essential for Genomics 66 http://www.cancercollaboratory.org/
  • 67. Open Data is Essential for Genomics Some challenges: 67 • So, we have lots of data, is it generated the same way?
  • 68. Open Data is Essential for Genomics Every country/group has basically been submitting: 68 – Simple Somatic Mutations (SSM or SNV) – Copy Number Alterations (CAN or CNV) – Structural Variants (SV) – Germline variants (SNPs) – Gene Expression (micro-arrays and RNASeq) – miRNA Expression (RNASeq) – Epigenomics (Arrays and Methylation) – Splicing Variation (RNASeq) – Protein Expression (Arrays)
  • 69. Open Data is Essential for Genomics Are they all using the same pipelines? 69 • No
  • 70. Open Data is Essential for Genomics 70
  • 71. Open Data is Essential for Genomics Steering Committee of PCAWG 71 • Peter Campbell, Sanger Inst. • Gady Getz, Broad • Jan Korbel, EMBL • Lincoln Stein, OICR • Josh Stuart, UCSC
  • 72. Open Data is Essential for Genomics PanCancer Analysis of Whole Genomes (PCAWG) • > 2,800 T/N pairs with clinical data from 20 tumour type of whole genome analysis. • Aligned with one standard pipeline. • Genomic Variants determined with 3 pipelines • 17 working groups • > 50 Papers are being written now.
  • 73. Open Data is Essential for Genomics https://www.biorxiv.org/search/pcawg
  • 74. Open Data is Essential for Genomics Deliverable for PCAWG include: 74 • 1st PANCANCER analysis on > 2,800 cancer tumours from a WGS perspective • RNA, SSM, CNV, Methylation analysis & germline • Published (executable) pipelines – Docker / Dockstore – Mutiple cloud access to data – Multiple portal access to data
  • 75. Open Data is Essential for Genomics https://dcc.icgc.org/pcawg 75
  • 76. Open Data is Essential for Genomics Working Groups (1/2) 76 1. Novel somatic mutation calling methods 2. Analysis of mutations in regulatory regions 3. Integration of transcriptome and genome 4. Integration of epigenome and genome 5. Consequences of somatic mutations on pathway and network activity 6. Patterns of structural variations, signatures, genomic correlations, retrotransposons, mobile elements 7. Mutation signatures and processes 8. Germline cancer genome
  • 77. Open Data is Essential for Genomics Working Groups (2/2) 77 9 Inferring driver mutations and identifying cancer genes and pathways 10 Translating cancer genomes to the clinic 11 Evolution and heterogeneity 12 Exploratory: portals, visualization and software infrastructure 13 Molecular subtypes and classification 14 Analysis of mutations in non-coding RNA 15 Exploratory: mitochondrial 16 Exploratory: pathogens 17 Tech Technical working group
  • 78. Open Data is Essential for Genomics https://goo.gl/AMxwSU
  • 79. Open Data is Essential for Genomics https://goo.gl/AMxwSU
  • 80. Open Data is Essential for Genomics https://goo.gl/AMxwSU
  • 81. Open Data is Essential for Genomics https://goo.gl/AMxwSU
  • 82. Open Data is Essential for Genomics http://dockstore.org 82
  • 83. Open Data is Essential for Genomics Docker Testing Group • Group that to ensure all container workflow work as expected. https://goo.gl/AMxwSU
  • 84. Open Data is Essential for Genomics Access to Data? • Human Data • Patients consented to have their DNA looked at so people could understand cancer • Need to have a system to maximize people’s gift to science.
  • 85. Open Data is Essential for Genomics
  • 86. Open Data is Essential for Genomics Identify yourself Fill out detail form which includes: • Contact and Project Information •Information Technology details and procedures for keeping data secure •Data Access Agreement All of these documents are put into a PDF file that you print and get your institution to sign off on your behalf
  • 87. Open Data is Essential for Genomics
  • 88. Open Data is Essential for Genomics
  • 89. Open Data is Essential for Genomics 89 https://icgc.org/daco/approved-projects 314 groups
  • 90. Open Data is Essential for Genomics DACO ICGC dbGaP GDC EGA TCGA BAM Open Open ERA BA M BA M EGA id & password WGS Ger m Line
  • 91. Open Data is Essential for Genomics Challenge: • Open Data and controlled access data • Not enough eyeballs on the data • Eyeballs on the data needed to make discoveries. https://goo.gl/ogbWXG
  • 92. Open Data is Essential for Genomics Culture of Sharing Openly • Public Funding agencies • Consortiums • Mentors • Peers • New generation (vs my old generation) • Has to become the norm
  • 93. Open Data is Essential for Genomics Final thoughts … • Access to data is essential for science • Getting data that is FAIR is hard work • It is essential to share the work you do if you want to be recognized, get tenure, get a job or a promotion. • Human data is more complicated, but don’t let that get in the way! • There is a lot of material out there, learn from it (& cite your sources)!
  • 94. Open Data is Essential for Genomics Last message to students and young PDFs and investigators:
  • 95. Open Data is Essential for Genomics Last message to students and young PDFs and investigators: Be open so people can see how great you are!
  • 96. ONTARIO INSTITUTE FOR CANCER RESEARC 96 915
  • 97. Open Data is Essential for Genomics DCC Software Developer Vincent Ferretti Dusan Andric Phuong-My Do Francois Gerthoffert Terry Lin Michael Moncada Vitalii Slobodianyk Bob Tiernay Douglas Wong Linda Xiang Junjun Zhang Acknowledgments ICGC/OICR Project leaders: Tom Hudson John McPherson Lincoln Stein Jared Simpson Paul Boutros Vincent Ferretti Francis Ouellette Jennifer Jennings Ouellette Lab Alysha Moncrieffe Ann Meyer Zhibin Lu Web Dev Joseph Yamada Kaman Wu Kim Cullion Koji Miyauchi Miyuki Fukuma ICGC DCC Biocuration Hardeep Nahal Marc Perry http://oicr.on.ca http://icgc.org … and all the patients and their families that that are putting their hopes into our work! Research IT/Systems David Sutton, Bob Gibson David Magda Rob Naccarato Brian Ott Gino Yearwood EGA Jordi Rambla De Argila Arcadi Navarro Audald Iloret Mauricio Moldes
  • 98. | ÉQUIPE DES AFFAIRES SCIENTIFIQUES 9827 mars 2017 B.F. Francis Ouellette Annina Spilker Joël Savard Diana IglesiasDiane Bouchard Cristina CiurliMicheline Ayoub Hélène Fournier
  • 99. Open Data is Essential for Genomics 99 Grazie
  • 100. Open Data is Essential for Genomics 100