SlideShare a Scribd company logo
NCI Cancer Genomics Cloud (CGC) Pilots
Steve Tsang
Attain LLC
National Cancer Institute
Disclaimer
The opinions/comments/assessment expressed in this article are the author's own and do not necessarily
reflect the view of the National Cancer Institute or National Institutes of Health.
https://ethics.od.nih.gov/topics/Disclaimer.htm
Cancer Genomic Data Challenges
● > 2.5 PB of TCGA data (WXS, RNASeq, WGS)
● Fragmentary repositories of cancer genomic data
○ TCGA, TARGET and CGCI have their own data repositories (DCCs)
○ Sequencing data: BAM files at CGhub while VCF/MAF files at DCC
● Assuming the 2.5 PB TCGA data set
○ Storage and Data Protection cost approximately $2,000,000 per year
○ Downloading TCGA data at 10 Gb/sec = 23 days
○ Only large institutions have the ability to utilize this data
○ These data types will continue to grow
Slide Courtesy of Tanja Davidsen, NCI
Cloud Pilots Concept: Co-located Compute & Data
Three Cancer Genomics Cloud Pilot Awardees
http://firecloud.orgFireCloud Concepts
● Data Files reside in Google Cloud
Storage
● Workspaces
● Tasks and Workflows
● Method Repositories
● Provenance captured for every
analysis run (i.e. what version of
what methods was run on what data
at what time)
FireCloud Overview
● The Workspace is the organizing
principle for FireCloud
○ When a workspace is created,
a Google bucket is
automatically attached to that
workspace
● The Data Model is the backbone
within the workspace
○ Holds meta-data, and bucket
pointers to input and output
http://cgc.systemsbiology.net/
… is to make TCGA data, together with tools and
compute-power, available and accessible to a broad
range of users using multiple access modes:
❏ Interactive web application
❏ Scripting languages: R, Python, SQL
❏ Direct programmatic access
❏ Build an open platform that can grow and evolve to satisfy a
broad range of users and use-cases
❏ Leverage the best existing tools and technologies, as they are
released
❏ Collaborate with the research community in areas of data
standards, containers, workflows, etc
❏ Provide a range of examples and tutorials to get newcomers
up and running quickly
http://www.cancergenomicscloud.org
/
❖The CGC aims to provide a collaborative environment where researchers can
take advantage of co-localized public data (like TCGA) and public tools; but
also recombine these with their private data and tools.
❖Guiding Principles
➢ Making data available isn’t enough to make it usable.
➢ The best science happens in teams.
➢ Reproducibility shouldn’t be hard.
➢ The impact of TCGA is extended by new data & tools
Seven Bridges Genomics CGC Objectives
❖Explore processed TCGA data for
mutations, copy number variations
and expression levels
❖Analyze data from their private
cohorts alongside TCGA data.
❖Use standard bioinformatics pipelines
to perform analyses.
❖Bring their own analysis tools directly
to the TCGA dataset.
❖Collaborate with researchers around
the world.
❖Access storage and compute
resources on the cloud on demand.
❖Access the CGC using the API as
Seven Bridges Genomic
CGC Features
Acknowledgement
Team CGC - https://goo.gl/f21Lqq
National Cancer Institute CBIIT
CGC Fact sheet - https://cbiit.nci.nih.gov/sites/nci-cbiit/files/Cloud_Pilot_Handout.pdf
Access Cloud Pilots https://cbiit.nci.nih.gov/ncip/nci-cancer-genomics-cloud-pilots/access-the-cloud-pilot-
platforms
Broad Institute - FireCloud - http://firecloud.org
Institute of Systems Biology - Cancer Genomics Cloud - http://cgc.systemsbiology.net/
Seven Bridges Genomics - Cancer Genomics Cloud - http://www.cancergenomicscloud.org/
Attain, LLC - http://http://www.attain.com/

More Related Content

Similar to The Cancer Genomics Cloud (CGC) pilots - an Introduction

The Cancer Genomics Cloud (CGC) Pilots NIH IC Show and Tell
The Cancer Genomics Cloud (CGC) Pilots   NIH IC Show and TellThe Cancer Genomics Cloud (CGC) Pilots   NIH IC Show and Tell
The Cancer Genomics Cloud (CGC) Pilots NIH IC Show and Tell
Steve Tsang
 
cBioPortal Webinar Slides (2/3)
cBioPortal Webinar Slides (2/3)cBioPortal Webinar Slides (2/3)
cBioPortal Webinar Slides (2/3)
Pistoia Alliance
 
NCI Cancer Research Data Commons - Overview
NCI Cancer Research Data Commons - OverviewNCI Cancer Research Data Commons - Overview
NCI Cancer Research Data Commons - Overview
imgcommcall
 
Opportunities for HPC in pharma R&D - main deck
Opportunities for HPC in pharma R&D - main deckOpportunities for HPC in pharma R&D - main deck
Opportunities for HPC in pharma R&D - main deck
Pistoia Alliance
 
Grid Projects In The US July 2008
Grid Projects In The US July 2008Grid Projects In The US July 2008
Grid Projects In The US July 2008
Ian Foster
 
2015 genome-center
2015 genome-center2015 genome-center
2015 genome-center
c.titus.brown
 
NRNB Annual Report 2018
NRNB Annual Report 2018NRNB Annual Report 2018
NRNB Annual Report 2018
Alexander Pico
 
FDA NGS and Big Data Conference September 2014
FDA NGS and Big Data Conference September 2014FDA NGS and Big Data Conference September 2014
FDA NGS and Big Data Conference September 2014
Warren Kibbe
 
The pulse of cloud computing with bioinformatics as an example
The pulse of cloud computing with bioinformatics as an exampleThe pulse of cloud computing with bioinformatics as an example
The pulse of cloud computing with bioinformatics as an example
Enis Afgan
 
GlobusWorld 2020 Keynote
GlobusWorld 2020 KeynoteGlobusWorld 2020 Keynote
GlobusWorld 2020 Keynote
Globus
 
Adelaide Rhodes Resume March 2023
Adelaide Rhodes Resume March 2023Adelaide Rhodes Resume March 2023
Adelaide Rhodes Resume March 2023
Stacy Taylor
 
EBI Industry programme TCGA Warren KIbbe November 2013
EBI Industry programme TCGA Warren KIbbe November 2013EBI Industry programme TCGA Warren KIbbe November 2013
EBI Industry programme TCGA Warren KIbbe November 2013
Warren Kibbe
 
dkNET Annual Meeting - June 2017
dkNET Annual Meeting - June 2017dkNET Annual Meeting - June 2017
dkNET Annual Meeting - June 2017
dkNET
 
GCAT Update June 2013 @ The Clinical Genome Conference
GCAT Update June 2013 @ The Clinical Genome ConferenceGCAT Update June 2013 @ The Clinical Genome Conference
GCAT Update June 2013 @ The Clinical Genome Conference
David Mittelman
 
CCCB Germline Variant Analysis on Cloud Platform
CCCB Germline Variant Analysis on Cloud PlatformCCCB Germline Variant Analysis on Cloud Platform
CCCB Germline Variant Analysis on Cloud Platform
Yaoyu Wang
 
San diego-supercomputing-sc17-user-group
San diego-supercomputing-sc17-user-groupSan diego-supercomputing-sc17-user-group
San diego-supercomputing-sc17-user-group
inside-BigData.com
 
CI4CC sustainability-panel
CI4CC sustainability-panelCI4CC sustainability-panel
CI4CC sustainability-panel
Ravi Madduri
 
dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...
dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...
dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...
dkNET
 
What is Data Commons and How Can Your Organization Build One?
What is Data Commons and How Can Your Organization Build One?What is Data Commons and How Can Your Organization Build One?
What is Data Commons and How Can Your Organization Build One?
Robert Grossman
 
Open Source Networking Solving Molecular Analysis of Cancer
Open Source Networking Solving Molecular Analysis of CancerOpen Source Networking Solving Molecular Analysis of Cancer
Open Source Networking Solving Molecular Analysis of Cancer
Open Networking Summit
 

Similar to The Cancer Genomics Cloud (CGC) pilots - an Introduction (20)

The Cancer Genomics Cloud (CGC) Pilots NIH IC Show and Tell
The Cancer Genomics Cloud (CGC) Pilots   NIH IC Show and TellThe Cancer Genomics Cloud (CGC) Pilots   NIH IC Show and Tell
The Cancer Genomics Cloud (CGC) Pilots NIH IC Show and Tell
 
cBioPortal Webinar Slides (2/3)
cBioPortal Webinar Slides (2/3)cBioPortal Webinar Slides (2/3)
cBioPortal Webinar Slides (2/3)
 
NCI Cancer Research Data Commons - Overview
NCI Cancer Research Data Commons - OverviewNCI Cancer Research Data Commons - Overview
NCI Cancer Research Data Commons - Overview
 
Opportunities for HPC in pharma R&D - main deck
Opportunities for HPC in pharma R&D - main deckOpportunities for HPC in pharma R&D - main deck
Opportunities for HPC in pharma R&D - main deck
 
Grid Projects In The US July 2008
Grid Projects In The US July 2008Grid Projects In The US July 2008
Grid Projects In The US July 2008
 
2015 genome-center
2015 genome-center2015 genome-center
2015 genome-center
 
NRNB Annual Report 2018
NRNB Annual Report 2018NRNB Annual Report 2018
NRNB Annual Report 2018
 
FDA NGS and Big Data Conference September 2014
FDA NGS and Big Data Conference September 2014FDA NGS and Big Data Conference September 2014
FDA NGS and Big Data Conference September 2014
 
The pulse of cloud computing with bioinformatics as an example
The pulse of cloud computing with bioinformatics as an exampleThe pulse of cloud computing with bioinformatics as an example
The pulse of cloud computing with bioinformatics as an example
 
GlobusWorld 2020 Keynote
GlobusWorld 2020 KeynoteGlobusWorld 2020 Keynote
GlobusWorld 2020 Keynote
 
Adelaide Rhodes Resume March 2023
Adelaide Rhodes Resume March 2023Adelaide Rhodes Resume March 2023
Adelaide Rhodes Resume March 2023
 
EBI Industry programme TCGA Warren KIbbe November 2013
EBI Industry programme TCGA Warren KIbbe November 2013EBI Industry programme TCGA Warren KIbbe November 2013
EBI Industry programme TCGA Warren KIbbe November 2013
 
dkNET Annual Meeting - June 2017
dkNET Annual Meeting - June 2017dkNET Annual Meeting - June 2017
dkNET Annual Meeting - June 2017
 
GCAT Update June 2013 @ The Clinical Genome Conference
GCAT Update June 2013 @ The Clinical Genome ConferenceGCAT Update June 2013 @ The Clinical Genome Conference
GCAT Update June 2013 @ The Clinical Genome Conference
 
CCCB Germline Variant Analysis on Cloud Platform
CCCB Germline Variant Analysis on Cloud PlatformCCCB Germline Variant Analysis on Cloud Platform
CCCB Germline Variant Analysis on Cloud Platform
 
San diego-supercomputing-sc17-user-group
San diego-supercomputing-sc17-user-groupSan diego-supercomputing-sc17-user-group
San diego-supercomputing-sc17-user-group
 
CI4CC sustainability-panel
CI4CC sustainability-panelCI4CC sustainability-panel
CI4CC sustainability-panel
 
dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...
dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...
dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...
 
What is Data Commons and How Can Your Organization Build One?
What is Data Commons and How Can Your Organization Build One?What is Data Commons and How Can Your Organization Build One?
What is Data Commons and How Can Your Organization Build One?
 
Open Source Networking Solving Molecular Analysis of Cancer
Open Source Networking Solving Molecular Analysis of CancerOpen Source Networking Solving Molecular Analysis of Cancer
Open Source Networking Solving Molecular Analysis of Cancer
 

Recently uploaded

GBSN - Biochemistry (Unit 6) Chemistry of Proteins
GBSN - Biochemistry (Unit 6) Chemistry of ProteinsGBSN - Biochemistry (Unit 6) Chemistry of Proteins
GBSN - Biochemistry (Unit 6) Chemistry of Proteins
Areesha Ahmad
 
Physiology of Nervous System presentation.pptx
Physiology of Nervous System presentation.pptxPhysiology of Nervous System presentation.pptx
Physiology of Nervous System presentation.pptx
fatima132662
 
Methods of grain storage Structures in India.pdf
Methods of grain storage Structures in India.pdfMethods of grain storage Structures in India.pdf
Methods of grain storage Structures in India.pdf
PirithiRaju
 
Clinical periodontology and implant dentistry 2003.pdf
Clinical periodontology and implant dentistry 2003.pdfClinical periodontology and implant dentistry 2003.pdf
Clinical periodontology and implant dentistry 2003.pdf
RAYMUNDONAVARROCORON
 
23PH301 - Optics - Optical Lenses.pptx
23PH301 - Optics  -  Optical Lenses.pptx23PH301 - Optics  -  Optical Lenses.pptx
23PH301 - Optics - Optical Lenses.pptx
RDhivya6
 
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdf
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdfMending Clothing to Support Sustainable Fashion_CIMaR 2024.pdf
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdf
Selcen Ozturkcan
 
_Extraction of Ethylene oxide and 2-Chloroethanol from alternate matrices Li...
_Extraction of Ethylene oxide and 2-Chloroethanol from alternate matrices  Li..._Extraction of Ethylene oxide and 2-Chloroethanol from alternate matrices  Li...
_Extraction of Ethylene oxide and 2-Chloroethanol from alternate matrices Li...
LucyHearn1
 
The binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defectsThe binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defects
Sérgio Sacani
 
Introduction_Ch_01_Biotech Biotechnology course .pptx
Introduction_Ch_01_Biotech Biotechnology course .pptxIntroduction_Ch_01_Biotech Biotechnology course .pptx
Introduction_Ch_01_Biotech Biotechnology course .pptx
QusayMaghayerh
 
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
hozt8xgk
 
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Leonel Morgado
 
Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...
Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...
Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...
Travis Hills MN
 
Sciences of Europe journal No 142 (2024)
Sciences of Europe journal No 142 (2024)Sciences of Europe journal No 142 (2024)
Sciences of Europe journal No 142 (2024)
Sciences of Europe
 
Farming systems analysis: what have we learnt?.pptx
Farming systems analysis: what have we learnt?.pptxFarming systems analysis: what have we learnt?.pptx
Farming systems analysis: what have we learnt?.pptx
Frédéric Baudron
 
Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...
Leonel Morgado
 
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
PsychoTech Services
 
Randomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNERandomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNE
University of Maribor
 
Alternate Wetting and Drying - Climate Smart Agriculture
Alternate Wetting and Drying - Climate Smart AgricultureAlternate Wetting and Drying - Climate Smart Agriculture
Alternate Wetting and Drying - Climate Smart Agriculture
International Food Policy Research Institute- South Asia Office
 
SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆
SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆
SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆
Sérgio Sacani
 
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
Scintica Instrumentation
 

Recently uploaded (20)

GBSN - Biochemistry (Unit 6) Chemistry of Proteins
GBSN - Biochemistry (Unit 6) Chemistry of ProteinsGBSN - Biochemistry (Unit 6) Chemistry of Proteins
GBSN - Biochemistry (Unit 6) Chemistry of Proteins
 
Physiology of Nervous System presentation.pptx
Physiology of Nervous System presentation.pptxPhysiology of Nervous System presentation.pptx
Physiology of Nervous System presentation.pptx
 
Methods of grain storage Structures in India.pdf
Methods of grain storage Structures in India.pdfMethods of grain storage Structures in India.pdf
Methods of grain storage Structures in India.pdf
 
Clinical periodontology and implant dentistry 2003.pdf
Clinical periodontology and implant dentistry 2003.pdfClinical periodontology and implant dentistry 2003.pdf
Clinical periodontology and implant dentistry 2003.pdf
 
23PH301 - Optics - Optical Lenses.pptx
23PH301 - Optics  -  Optical Lenses.pptx23PH301 - Optics  -  Optical Lenses.pptx
23PH301 - Optics - Optical Lenses.pptx
 
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdf
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdfMending Clothing to Support Sustainable Fashion_CIMaR 2024.pdf
Mending Clothing to Support Sustainable Fashion_CIMaR 2024.pdf
 
_Extraction of Ethylene oxide and 2-Chloroethanol from alternate matrices Li...
_Extraction of Ethylene oxide and 2-Chloroethanol from alternate matrices  Li..._Extraction of Ethylene oxide and 2-Chloroethanol from alternate matrices  Li...
_Extraction of Ethylene oxide and 2-Chloroethanol from alternate matrices Li...
 
The binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defectsThe binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defects
 
Introduction_Ch_01_Biotech Biotechnology course .pptx
Introduction_Ch_01_Biotech Biotechnology course .pptxIntroduction_Ch_01_Biotech Biotechnology course .pptx
Introduction_Ch_01_Biotech Biotechnology course .pptx
 
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
 
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
 
Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...
Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...
Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...
 
Sciences of Europe journal No 142 (2024)
Sciences of Europe journal No 142 (2024)Sciences of Europe journal No 142 (2024)
Sciences of Europe journal No 142 (2024)
 
Farming systems analysis: what have we learnt?.pptx
Farming systems analysis: what have we learnt?.pptxFarming systems analysis: what have we learnt?.pptx
Farming systems analysis: what have we learnt?.pptx
 
Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...Authoring a personal GPT for your research and practice: How we created the Q...
Authoring a personal GPT for your research and practice: How we created the Q...
 
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...
 
Randomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNERandomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNE
 
Alternate Wetting and Drying - Climate Smart Agriculture
Alternate Wetting and Drying - Climate Smart AgricultureAlternate Wetting and Drying - Climate Smart Agriculture
Alternate Wetting and Drying - Climate Smart Agriculture
 
SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆
SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆
SDSS1335+0728: The awakening of a ∼ 106M⊙ black hole⋆
 
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
(June 12, 2024) Webinar: Development of PET theranostics targeting the molecu...
 

The Cancer Genomics Cloud (CGC) pilots - an Introduction

  • 1. NCI Cancer Genomics Cloud (CGC) Pilots Steve Tsang Attain LLC National Cancer Institute
  • 2. Disclaimer The opinions/comments/assessment expressed in this article are the author's own and do not necessarily reflect the view of the National Cancer Institute or National Institutes of Health. https://ethics.od.nih.gov/topics/Disclaimer.htm
  • 3. Cancer Genomic Data Challenges ● > 2.5 PB of TCGA data (WXS, RNASeq, WGS) ● Fragmentary repositories of cancer genomic data ○ TCGA, TARGET and CGCI have their own data repositories (DCCs) ○ Sequencing data: BAM files at CGhub while VCF/MAF files at DCC ● Assuming the 2.5 PB TCGA data set ○ Storage and Data Protection cost approximately $2,000,000 per year ○ Downloading TCGA data at 10 Gb/sec = 23 days ○ Only large institutions have the ability to utilize this data ○ These data types will continue to grow Slide Courtesy of Tanja Davidsen, NCI
  • 4. Cloud Pilots Concept: Co-located Compute & Data
  • 5. Three Cancer Genomics Cloud Pilot Awardees
  • 6. http://firecloud.orgFireCloud Concepts ● Data Files reside in Google Cloud Storage ● Workspaces ● Tasks and Workflows ● Method Repositories ● Provenance captured for every analysis run (i.e. what version of what methods was run on what data at what time)
  • 7. FireCloud Overview ● The Workspace is the organizing principle for FireCloud ○ When a workspace is created, a Google bucket is automatically attached to that workspace ● The Data Model is the backbone within the workspace ○ Holds meta-data, and bucket pointers to input and output
  • 8. http://cgc.systemsbiology.net/ … is to make TCGA data, together with tools and compute-power, available and accessible to a broad range of users using multiple access modes: ❏ Interactive web application ❏ Scripting languages: R, Python, SQL ❏ Direct programmatic access
  • 9. ❏ Build an open platform that can grow and evolve to satisfy a broad range of users and use-cases ❏ Leverage the best existing tools and technologies, as they are released ❏ Collaborate with the research community in areas of data standards, containers, workflows, etc ❏ Provide a range of examples and tutorials to get newcomers up and running quickly
  • 10. http://www.cancergenomicscloud.org / ❖The CGC aims to provide a collaborative environment where researchers can take advantage of co-localized public data (like TCGA) and public tools; but also recombine these with their private data and tools. ❖Guiding Principles ➢ Making data available isn’t enough to make it usable. ➢ The best science happens in teams. ➢ Reproducibility shouldn’t be hard. ➢ The impact of TCGA is extended by new data & tools Seven Bridges Genomics CGC Objectives
  • 11. ❖Explore processed TCGA data for mutations, copy number variations and expression levels ❖Analyze data from their private cohorts alongside TCGA data. ❖Use standard bioinformatics pipelines to perform analyses. ❖Bring their own analysis tools directly to the TCGA dataset. ❖Collaborate with researchers around the world. ❖Access storage and compute resources on the cloud on demand. ❖Access the CGC using the API as Seven Bridges Genomic CGC Features
  • 12. Acknowledgement Team CGC - https://goo.gl/f21Lqq National Cancer Institute CBIIT CGC Fact sheet - https://cbiit.nci.nih.gov/sites/nci-cbiit/files/Cloud_Pilot_Handout.pdf Access Cloud Pilots https://cbiit.nci.nih.gov/ncip/nci-cancer-genomics-cloud-pilots/access-the-cloud-pilot- platforms Broad Institute - FireCloud - http://firecloud.org Institute of Systems Biology - Cancer Genomics Cloud - http://cgc.systemsbiology.net/ Seven Bridges Genomics - Cancer Genomics Cloud - http://www.cancergenomicscloud.org/ Attain, LLC - http://http://www.attain.com/

Editor's Notes

  1. this is good but I would focus on how the native Google platform has been fully exploited - BigQuery and Google Genomics in addition to google storage
  2. It would be nice to have a visual of the case explorer or something else. Do you plan to explain why 3 pilots, what was uniquely evaluated in each of the three? also do you plan a concluding slide: - on next steps from the programs perspective and how these would become part of the Commons vision or something like that - a call to action for those who want to use it to access cancer data, availability of free credits and or mimic it for their ICs using the open source code of the platforms available for others use.