Call Girls Service Chennai Jiya 7001305949 Independent Escort Service Chennai
Cancer moonshot and data sharing
1. US Beau Biden Cancer Moonshot,
Data Sharing,
NCI Genomic Data Commons
Warren Kibbe, PhD
warren.kibbe@nih.gov
@wakibbe
May 25th, 2017
2. 2
To develop the knowledge base
that will lessen the burden of
cancer in the United States and
around the world.
NCI Mission
3. 3
In 2016 there were an estimated
1,700,000 new cancer cases
and
600,000 cancer deaths
- American Cancer Society
Cancer remains the second most common cause of
death in the U.S.
- Centers for Disease Control and Prevention
4. 4
In 2016 there were an estimated
15,500,000
cancer survivors in the U. S.
5. 5
Understanding Cancer
Precision medicine will lead to fundamental
understanding of the complex interplay between
genetics, epigenetics, nutrition, environment and clinical
presentation and direct effective, evidence-based
prevention and treatment.
6. 6
(10,000+ patient tumors and increasing)
Courtesy of P. Kuhn (USC)
2006-2015:
A Decade of Illuminating the
Underlying Causes of Primary
Untreated Tumors Omics
Characterization
Cancer is a grand challenge
Deep biological understanding
Advances in scientific methods
Advances in instrumentation
Advances in technology
Data and computation
Mathematical models
Cancer Research and Care generate
detailed data that is critical to
create a learning health system for cancer
Requires:
7. How do we solve problems in Cancer
Support and incentives for team science, collaboration
We need FAIR, open data
Support open source, open science
Support for rapid innovation
8. 8
Cancer Moonshot
Precision Medicine Initiative (PMI)
National Strategic Computing Initiative (NSCI)
Making data available: Genomic Data Commons
Using the cloud: NCI Cloud Pilots
International collaboration: Proteogenomics
Investigate, explore, predict using real-world data
9. 9
(10,000+ patient tumors and increasing)
Courtesy of P. Kuhn (USC)
2006-2015:
A Decade of Illuminating the
Underlying Causes of Primary
Untreated Tumors Omics
Characterization
10. http://cancerimagingarchive.net
• 33,000 total subjects
in the archive
• 67 data sets currently
available
• 21 from The Cancer
Genome Atlas project
• 10 from the Quantitative
Imaging Network
• Clinical trial data from
ECOG-ACRIN and
RTOG
11. 11
18
Application of Cancer Genomics is changing
https://www.cancer.gov/about-cancer/treatment/clinical-trials/nci-supported/nci-match
12. 12
MATCH and Precision Oncology
It isn’t just about matching patients to therapy, it is
also about avoiding therapies that will not work.
Biology is complex, and we still have a lot of basic
biology to understand
Genomics+imaging+clinical labs is the first wave
of precision oncology
13. 13
Biology and Medicine are now data
intensive enterprises
Scale is rapidly changing
Technology, data, computing and IT are
pervasive in the lab, the clinic, in the
home, and across the population
18. 18
Expert Systems vs Machine Learning
In 1945, the British philosopher Gilbert Ryle
identified two kinds of knowledge— factual,
propositional knowledge that can be ordered into
rules—“knowing that.” versus implicit,
experiential, skill-based—“knowing how.”
Machine Learning is based on ‘learning how’.
Expert systems, or rule based machines, are
based on ‘knowing that’.
21. 21
The Beau Biden Cancer Moonshot
• Accelerate progress in cancer,
including prevention & screening
• From cutting edge basic research to
wider uptake of standard of care
• Encourage greater cooperation
and collaboration
• Within and between academia,
government, and private sector
• Enhance data sharing
Blue Ribbon Panel recommendations (Oct); Implementation Working Groups established (Jan)
cancer.gov/brp
22. 22
• 28 Members
• Clinicians, researchers, advocates, representatives from pharma and IT
• Three face-to-face meetings to identify “Moonshot” recommendations
• 7 Working Groups
• Clinical trials, enhanced data sharing, cancer immunology, tumor
evolution, implementation science, pediatric cancer, precision prevention
and early detection
• Met weekly for 6 weeks to generate 2-3 recommendations/working group
• More than 150 people were part of the working groups
Blue Ribbon Panel: Members & Working Groups
24. Blue Ribbon Panel Recommendations
• Network for Direct Patient Engagement
• Cancer Immunotherapy Translational Science Network
• Therapeutic Target Identification to Overcome Drug Resistance
• A National Cancer Data Ecosystem for Sharing and Analysis
• Fusion Oncoproteins in Childhood Cancers
• Symptom Management Research
• Prevention and Early Detection – Implementation of Evidence-based Approaches
• Retrospective Analysis of Biospecimens from Patients Treated with Standard of
Care
• Generation of 4D Human Tumor Atlas
• Development of New Enabling Cancer Technologies
25. Vision:
Enable the creation of a Learning Healthcare System
for Cancer, where as a nation we learn from the
contributed knowledge and experience of every
cancer patient. As part of the Cancer Moonshot, we
want to unleash the power of data to enhance,
improve, and inform the journey of every cancer patient
from the point of diagnosis through survivorship.
26. 26
GDC as an example of a new
architecture for storing and sharing
cancer data
27. 27
The Cancer Genomic Data Commons
(GDC) is an existing effort to standardize
and simplify submission of genomic data
to NCI and follow the principles of FAIR
– Findable, Accessible, Attributable,
Interoperable, Reusable, and Provide
Recognition.
The GDC is part of the NIH Big Data to
Knowledge (BD2K) initiative and an
example of the NIH Commons
Genomic Data Commons
Microattribution, nanopublications, tracking the use of
data, annotation of data, use of algorithms, supports
the data /software /metadata life cycle to provide
credit and analyze impact of data, software, analytics,
algorithm, curation and knowledge sharing
Force11 white paper
https://www.force11.org/group/fairgroup/fairprinciples
28. 28
Data Sharing and the FAIR Principles
FAIR –
Making data
Findable,
Accessible,
Attributable,
Interoperable,
Reusable,
and provide Recognition
Force11 white paper
https://www.force11.org/group/fairgroup/fairprinciples
29. NCI GENOMIC DATA COMMONS
LAUNCHED AT ASCO ON JUNE 6, 2016
https://gdc-portal.nci.nih.gov
2.6 PB of legacy data and 1.5 PB of harmonized data.
30. AT THE JUNE 29TH CANCER MOONSHOT SUMMIT, FOUNDATION
MEDICINE ANNOUNCED THE RELEASE OF 18,000 GENOMIC
PROFILES TO THE NCI GDC
31. • MMRF is the first non-profit organization to
upload information to the GDC
• Among its contributions will be data from relating
Clinical Outcomes in MM to Personal Assessment
of Genetic Profile (CoMMpass) study which began
in 2011 and has thus far enrolled over 1,150
patients
• Over the next eight years, patients in CoMMpass
will get a repeat biopsy and a new genomic
analysis at each six-month checkup and/or at
disease progression
• Tumor samples are being collected and analyzed
when possible at the time of any relapse. New data
will be deposited every six months at a
minimum
32. GDC Acknowledgements
NCI Center for Cancer Genomics Univ. of Chicago
Bob Grossman
Allison Heath
Mike Ford
Zhenyu Zhang
Ontario Institute for Cancer Research
Lou Staudt
Zhining Wang
Martin Ferguson
JC Zenklusen
Daniela Gerhard
Deb Steverson
Vincent Ferretti
'Francois Gerthoffert
JunJun Zhang
Leidos Biomedical Research
Mark Jensen
Sharon Gaheen
Himanso Sahni
NCI NCI CBIIT
Tony Kerlavage
Tanya Davidsen
33. 33
NCI Cancer Genomics Cloud Pilots
Democratize access to
NCI-generated genomic
and related data, and to
create a cost-effective
way to provide scalable
computational capacity
to the cancer research
community.
Cloud Pilots provide:
• Access to large genomic data sets without need to download
• Access to popular pipelines and visualization tools
• Ability for researchers to bring their own tools and pipelines to the data
• Ability for researchers to bring their own data and analyze in combination with existing genomic
data
• Workspaces, for researchers to save and share their data and results of analyses
34. Workspace –
isolated environment for collaborative analysis
Data + Methods → Results
sample data and
metadata (e.g.
BAMs, tissue type)
algorithms
(e.g. mutation
calling)
Wiring logic
(e.g. use the exome
capture BAM)
executions and results
(e.g. run mutation caller v41
on this exact bam and track
results)
Slide courtesy of Broad Institute
35. 35
• PI: Gad Getz
• Google Cloud
• Firehose in the cloud including Broad best practices workflows
•http://firecloud.org
Broad Institute
• PI: Ilya Shmulevich
• Google Cloud
• Leverage Google infrastructure; Novel query and visualization
•http://cgc.systemsbiology.net/
Institute for
Systems Biology
• PI: Deniz Kural
• Amazon Web Services
• Interactive data exploration; > 30 public pipelines
•http://www.cancergenomicscloud.org
Seven Bridges
Genomics
Three NCI Genomics Cloud Pilots
Selection
Design/Build
I
Design/Build
II
Evaluation Extension
Sept 2016Jan 2016April 2015Sept 2014
Jan 2014
36. CGC Pilot Team Principal Investigators
• Gad Getz, Ph.D - Broad Institute - http://firecloud.org
• Ilya Shmulevich, Ph.D - ISB - http://cgc.systemsbiology.net/
• Deniz Kural, Ph.D - Seven Bridges – http://www.cancergenomicscloud.org
NCI Project Officer & CORs
• Anthony Kerlavage, Ph.D –Project Officer
• Juli Klemm, Ph.D – COR, Broad Institute
• Tanja Davidsen, Ph.D – COR, Institute for Systems Biology
• Ishwar Chandramouliswaran, MS, MBA – COR, Seven Bridges Genomics
GDC Principal Investigator
• Robert Grossman, Ph.D - University of Chicago
• Allison Heath, Ph.D - University of Chicago
• Vincent Ferretti, Ph.D - Ontario Institute for Cancer Research
Cancer Genomics Cloud Project Teams
NCI Leadership Team
• Doug Lowy, M.D.
• Lou Staudt, M.D., Ph.D.
• Stephen Chanock, M.D.
• George Komatsoulis, Ph.D.
• Warren Kibbe, Ph.D.
Center for Cancer Genomics Partners
• JC Zenklusen, Ph.D.
• Daniela Gerhard, Ph.D.
• Zhining Wang, Ph.D.
• Liming Yang, Ph.D.
• Martin Ferguson, Ph.D.
37. The NCI Cancer Research Data Commons
A virtual, expandable infrastructure
Standardized data submission and
Q/C
Controlled vocabularies
Harmonization by subject matter
experts
Genomic Data
Proteomic Data
GDC
Clinical
Functional
Cancer Models
Imaging
Population
Proteomics
NCI Cancer Research
Data Commons
GDC
Imaging Data
Data Contributors
Biologists / Clinical
Researchers
Clinicians and Patients
Tool /
Algorithm
Developers
Computational
Scientists
Authentication
&
Authorization
38. Node A
Cloud X
NCI Cancer Research Data Commons:
An Individual Node
Cloud Y
Data
Contributors
Data
Submission
Data
Mirroring
40. Development of the NCI Genomic Data Commons (GDC)
To Foster the Molecular Diagnosis and Treatment of Cancer
GDC
Bob Grossman PI
Univ. of Chicago
Ontario Inst. Cancer Res.
Leidos
Institute of Medicine
Towards Precision Medicine
2011
41.
42.
43. Discovery of Cancer Drivers With 2% Prevalence
Lung adeno.
+ 2,900
Colorectal
+ 1,200
Ovarian
+ 500
Lawrence et al, Nature 2014
Power Calculation for Cancer Driver Discovery
Need to resequence >100,000 tumors to
identify all cancer drivers at >2% prevalence
44. Cancer Research Data Ecosystem – Cancer Moonshot BRP
Well characterized
research data sets Cancer cohorts Patient data
EHR, Lab Data, Imaging,
PROs, Smart Devices,
Decision Support
Learning from every
cancer patient
Active research
participation
Research information
donor
Clinical Research
Observational studies
Proteogenomics
Imaging data
Clinical trials
Discovery
Patient engaged
Research
Surveillance
Big Data
Implementation research
SEERGDC
45. Improve understanding of the effectiveness of cancer
treatment in the “real world” through automation
45
SEER Precision Cancer
Surveillance
Surveillance data captured/ planned on each cancer patient for the entire population
Pathology
Molecular
Characterization
Detailed Initial
Treatment
Detailed
Subsequent
Treatment
Survival
Cause of Death
Progression
Recurrence
Complement trials to support
development of new diagnostics
and treatments
Understand treatment and
improve outcomes in the
“real world”
Genome
Demographics
47. DATA SHARING PLEDGE …
“leading research centers that have pledged to
make genomic & proteomic datasets available to
the public to advance cancer care”
10MOUs / 11 countries /
18institutions
ICPC (International Cancer Proteogenome Consortium)
48. 48
Integrated data sets, interoperable
resources, harmonized data are
necessary for and enable
biologically informed cancer
computational predictive models
51. 51
NIH Genomic Data Sharing Policy
https://gds.nih.gov/
Went into effect January 25, 2015
NCI guidance:
http://www.cancer.gov/grants-training/grants-
management/nci-policies/genomic-data
Requires public sharing of genomic data sets