Presentation to the Department of Biology at the University of Windsor, Windsor, Ontario. The description and update of activities related to the International Cancer Genome Consortium (ICGC)
Biocuration activities for the International Cancer Genome Consortium (ICGC).Neuro, McGill University
The document discusses biocuration activities for the International Cancer Genome Consortium (ICGC). It provides information on the goals of ICGC including comprehensively analyzing 50 different cancer types/subtypes and making the genomic and clinical data publicly available. It describes the types of data being collected, standards being developed for data access and sharing, and current status of datasets released.
International Cancer Genomics Consortium (ICGC) Data Coordinating CenterNeuro, McGill University
The document is a presentation slide deck for the International Cancer Genome Consortium (ICGC) Data Coordinating Center (DCC) given on November 14th 2013. It provides an overview of the ICGC, including its goals to catalog genomic abnormalities in 50 different cancer types using comprehensive genome, transcriptome, methylome, and clinical data analysis. It describes the activities of the ICGC DCC, which provides tools and infrastructure for data uploading, tracking, quality control, and distribution. The DCC aims to make ICGC data accessible and useful to researchers through search and analysis capabilities on its data portal.
Presentation at the Canadian Cancer Research Conference satellite bioinformatics.ca workshop. This one is an introduction to tcga, icgc and cosmic databases.
This document provides a status update and overview of the International Cancer Genomics Consortium (ICGC). The ICGC aims to sequence 500 tumor/normal pairs from 50 different cancer types to identify genome changes and make the data available for research. It coordinates cancer genome projects internationally to maximize data collection while minimizing duplication of efforts. The ICGC has established policies for data access, publication, and intellectual property. To date it has sequenced over 12,000 cancer genomes through 55 projects across 18 jurisdictions. The ICGC Data Coordination Center manages data submission and access and provides portals and tools for searching and accessing datasets.
The document provides information about a workshop on cancer genomic databases, including The Cancer Genome Atlas (TCGA), the International Cancer Genome Consortium (ICGC), and the Catalogue of Somatic Mutations in Cancer (COSMIC). It summarizes the goals, data access, and analysis tools available for each database. It also discusses controlled access vs open data and the process for applying for access to controlled TCGA and ICGC genomic and clinical data.
Cancer genome databases & Ecological databases Waliullah Wali
Introduction
Biological databases are libraries of life sciences information, collected from scientific experiments, published literature, high-throughput experiment technology, and computational analysis.
Information contained in biological databases includes gene function, structure, localization, clinical effects of mutations as well as similarities of biological sequences and structures.
Cancer genome databases
COSMIC cancer database
COSMIC cancer database
COSMIC is an online database of somatically acquired mutations found in human cancer.
The database is freely available.
COSMIC cancer database
Types of data
Expert curation data
Genome-wide screen data
COSMIC cancer database
Expert curation data
Manually input by COSMIC expert curators.
Consists of comprehensive literature curation followed by subsequent updates.
Includes additional data points relevant to each disease and publication.
Provides accurate frequency data as mutation negative samples are specified.
COSMIC cancer database
Genome-wide screen data
Uploaded from publications reporting large scale genome screening data or imported from other databases such as TCGA and ICGC.
Provides unbiased molecular profiling of diseases while covering the whole genome.
Provides objective frequency data by interpreting non mutant genes across each genome.
Facilitates finding novel driver genes in cancer.
Enter into -
COSMIC cancer database
by typing http://cancer.sanger.ac.uk/cosmic
in the address bar of Browser
Searching Process
Examples
Examples
Examples
Examples
Ecological databases
Ecological databases
Ecological databases is a source for finding ecological datasets and quickly figuring out the best ways to use them.
BioOne
DataONE
GEOBASE
BioOne
BioOne is a nonprofit publisher that aims to make scientific research more accessible.
BioOne was established in 1999 in Washington, DC.
BioOne is Complete and open-access.
It serves a community of over 140 society and institutional publishers, 4,000 accessing institutions, and millions of researchers worldwide.
Enter into -
BioOne Ecological database
by typing http://www.bioone.org/
in the address bar of Browser
This document provides permissions for sharing and reusing the content of a presentation. It states that the presentation can be:
1) Copied, shared, adapted, or remixed.
2) Photographed, filmed, or broadcast.
3) Blogged about, live-blogged, or have videos posted.
As long as the work is attributed to its author and respects any rights and licenses associated with its components. One slide was created by Cameron Neylon and is available under a CC0 license. Social media icons were adapted from another source with permission.
Biocuration activities for the International Cancer Genome Consortium (ICGC).Neuro, McGill University
The document discusses biocuration activities for the International Cancer Genome Consortium (ICGC). It provides information on the goals of ICGC including comprehensively analyzing 50 different cancer types/subtypes and making the genomic and clinical data publicly available. It describes the types of data being collected, standards being developed for data access and sharing, and current status of datasets released.
International Cancer Genomics Consortium (ICGC) Data Coordinating CenterNeuro, McGill University
The document is a presentation slide deck for the International Cancer Genome Consortium (ICGC) Data Coordinating Center (DCC) given on November 14th 2013. It provides an overview of the ICGC, including its goals to catalog genomic abnormalities in 50 different cancer types using comprehensive genome, transcriptome, methylome, and clinical data analysis. It describes the activities of the ICGC DCC, which provides tools and infrastructure for data uploading, tracking, quality control, and distribution. The DCC aims to make ICGC data accessible and useful to researchers through search and analysis capabilities on its data portal.
Presentation at the Canadian Cancer Research Conference satellite bioinformatics.ca workshop. This one is an introduction to tcga, icgc and cosmic databases.
This document provides a status update and overview of the International Cancer Genomics Consortium (ICGC). The ICGC aims to sequence 500 tumor/normal pairs from 50 different cancer types to identify genome changes and make the data available for research. It coordinates cancer genome projects internationally to maximize data collection while minimizing duplication of efforts. The ICGC has established policies for data access, publication, and intellectual property. To date it has sequenced over 12,000 cancer genomes through 55 projects across 18 jurisdictions. The ICGC Data Coordination Center manages data submission and access and provides portals and tools for searching and accessing datasets.
The document provides information about a workshop on cancer genomic databases, including The Cancer Genome Atlas (TCGA), the International Cancer Genome Consortium (ICGC), and the Catalogue of Somatic Mutations in Cancer (COSMIC). It summarizes the goals, data access, and analysis tools available for each database. It also discusses controlled access vs open data and the process for applying for access to controlled TCGA and ICGC genomic and clinical data.
Cancer genome databases & Ecological databases Waliullah Wali
Introduction
Biological databases are libraries of life sciences information, collected from scientific experiments, published literature, high-throughput experiment technology, and computational analysis.
Information contained in biological databases includes gene function, structure, localization, clinical effects of mutations as well as similarities of biological sequences and structures.
Cancer genome databases
COSMIC cancer database
COSMIC cancer database
COSMIC is an online database of somatically acquired mutations found in human cancer.
The database is freely available.
COSMIC cancer database
Types of data
Expert curation data
Genome-wide screen data
COSMIC cancer database
Expert curation data
Manually input by COSMIC expert curators.
Consists of comprehensive literature curation followed by subsequent updates.
Includes additional data points relevant to each disease and publication.
Provides accurate frequency data as mutation negative samples are specified.
COSMIC cancer database
Genome-wide screen data
Uploaded from publications reporting large scale genome screening data or imported from other databases such as TCGA and ICGC.
Provides unbiased molecular profiling of diseases while covering the whole genome.
Provides objective frequency data by interpreting non mutant genes across each genome.
Facilitates finding novel driver genes in cancer.
Enter into -
COSMIC cancer database
by typing http://cancer.sanger.ac.uk/cosmic
in the address bar of Browser
Searching Process
Examples
Examples
Examples
Examples
Ecological databases
Ecological databases
Ecological databases is a source for finding ecological datasets and quickly figuring out the best ways to use them.
BioOne
DataONE
GEOBASE
BioOne
BioOne is a nonprofit publisher that aims to make scientific research more accessible.
BioOne was established in 1999 in Washington, DC.
BioOne is Complete and open-access.
It serves a community of over 140 society and institutional publishers, 4,000 accessing institutions, and millions of researchers worldwide.
Enter into -
BioOne Ecological database
by typing http://www.bioone.org/
in the address bar of Browser
This document provides permissions for sharing and reusing the content of a presentation. It states that the presentation can be:
1) Copied, shared, adapted, or remixed.
2) Photographed, filmed, or broadcast.
3) Blogged about, live-blogged, or have videos posted.
As long as the work is attributed to its author and respects any rights and licenses associated with its components. One slide was created by Cameron Neylon and is available under a CC0 license. Social media icons were adapted from another source with permission.
Steve Rozen's keynote talk at IEEE CIBCB 2016
Big Genome Data Sheds Light on Cancer Causes
Steven G. Rozen, PhD
Professor, Cancer & Stem Cell Programme, Duke-NUS Medical School, Singapore
Director, Duke-NUS Centre for Computational Biology
The last eight years have see a revolution in the availability of DNA sequencing data. This revolution has been driven by costs that have plummeted from US$ 10 million per human genome in 2008 to US $1,200 today. Abundant sequencing data brings with it a previously unimaginable range of research possibilities in all areas of biomedical research. Naturally, these research possibilities make heavy demands on computation and data storage, because costs of sequencing are falling much faster than Moore's law. In this talk I will present a high level overview of these computational demands. I will then go into detail on a few of the cancer-related big data projects my lab is working on. One of these is "mutation signature analysis", which has important applications in cancer prevention and epidemiology and in research into the fundamental processes by which cancers arise. One example of the importance of this approach is the recent finding that a highly mutagenic herbal remedy is implicated in many more geographical regions and types of cancer than suspected a few years ago.
Presentation for teaching faculty about resources, data, issues, and strategies for including personal genomics in the classroom, within the context of precision medicine as an overarching theme.
Learning, Training, Classification, Common Sense and Exascale ComputingJoel Saltz
In this talk, I will describe work my group has carried out in development of deep learning methods that target semantic segmentation and object identification tasks in terapixel Pathology datasets and for satellite data. I will describe what we have been able to achieve, how this work can generalize to additional types of problems and will outline how exascale computing could be used to transform and integrate our methods and pipelines. I will then go on to outline broad research program in exascale computing and deep learning that promises to identify common deep learning methods for previously disparate large and extreme scale data tasks.
The Application of Next Generation Sequencing (NGS) in cancer treatmentPremadarshini Sai
Next-generation sequencing (NGS) has several advantages for cancer treatment including high throughput sequencing, screening of multiple genes simultaneously, and decreased costs. NGS faces challenges from complex data analysis and validation of new technologies. Key clinical applications of NGS include whole genome sequencing, transcriptome analysis via RNA-seq, and sequencing of cell-free DNA. Future areas of development include immunotherapy, epigenetics research, and using circulating tumor cells to detect early relapse. More research is still needed to fully realize the potential of NGS in personalized cancer treatment.
- The document discusses the Total Cancer Care (TCC) approach at Moffitt Cancer Center, which aims to provide personalized cancer care through comprehensive data collection and analysis.
- TCC collects extensive clinical, genomic, treatment and outcomes data from over 78,000 consented patients to power research studies and clinical trials matching. Molecular profiling has been conducted on over 14,000 tumor samples.
- The TCC data is housed in a large integrated database and used by researchers for studies in areas like radiochemotherapy response, exome sequencing, immunology biomarkers, and cancer epidemiology.
- The database also helps clinicians identify eligible patients for clinical trials and develop evidence-based treatment pathways. The goal is to transform cancer
This document discusses Moffitt Cancer Center's Total Cancer Care program which aims to transform cancer care through a personalized approach. It involves collecting extensive clinical, molecular, and biospecimen data from patients over their lifetime to power research. The goals are to improve outcomes through early detection, personalized treatment, and clinical trials matching. Moffitt has established an extensive biorepository and informatics platform to integrate data from over 78,000 consented patients to enable precision oncology research.
Integrative Everything, Deep Learning and Streaming DataJoel Saltz
Workshop on Clusters, Clouds, and Data for Scientific Computing, September 6, 2018
The need to create to label information and segment regions in individual sensor data sources and to create synthesizes from multiple disparate data sources span many areas of science, biomedicine and technology. The rapid evolution in sensor technologies – from digital microscopes to UAVs drive requirements in this area. I will describe a variety of use cases, describe technical challenges as well as tools, algorithms and techniques developed by our group and collaborators.
Big Data and Genomic Medicine by Corey NislowKnome_Inc
View the webinar at: http://www.knome.com/webinar-big-data-genomic-medicine. This presentation covers an overview of genomic medicine, requirements and challenges of next-generation sequencing, bottlenecks to broader healthcare adoption, and why “we want to sequence everyone.”
The Global Micorbial Identifier (GMI) initiative - and its working groupsExternalEvents
http://www.fao.org/about/meetings/wgs-on-food-safety-management/en/
The GMI initiative - and its working groups. Presentation from the Technical Meeting on the impact of Whole Genome Sequencing (WGS) on food safety management -23-25 May 2016, Rome, Italy.
The document outlines plans to transition the cBioPortal cancer genomics platform to an open source model with coordinated development between Memorial Sloan Kettering Cancer Center, Dana-Farber Cancer Institute, and Princess Margaret Cancer Centre. It discusses expanding usage, new features, funding options, and establishing an advisory committee. The goal is to build a sustainable open source community through collaborative development, additional funding, and engagement with users and potential contributors.
Next generation sequencing in cancer treatment MarliaGan
Next-generation sequencing (NGS) provides several advantages over first generation sequencing methods. NGS allows large amounts of genomic information to be sequenced in parallel at a lower cost. NGS has various applications in cancer treatment including predicting cancer progression, identifying drug targets and resistance mutations, detecting minimal residual disease, and improving cancer classification. While powerful, NGS also faces limitations such as tissue heterogeneity, complexity of data analysis, and difficulties identifying driver mutations as cancers evolve with treatment.
Additional value of prenatal genomic array testing in fetuses with isolated structural ultrasound abnormalities and a normal karyotype: a systematic review of the literature
M.C. de Wit, M.I. Srebniak, L.C.P. Govaerts, D. Van Opstal, R.J.H. Galjaard and A.T.J.I. Go
Link to free access article: http://onlinelibrary.wiley.com/doi/10.1002/uog.12575/abstract
HDx™ Reference Standards and Reference Materials for Next Generation Sequenci...Candy Smellie
This document summarizes a presentation about reference standards for next generation sequencing (NGS). Horizon Diagnostics has developed genomic DNA and formalin-fixed, paraffin-embedded (FFPE) reference standards containing defined mutations at known allelic frequencies to validate NGS workflows and monitor assay performance. Multiplex reference standards contain up to 40 mutations at low allelic frequencies down to 1.3% that can be quantified using digital PCR. Several laboratories demonstrated they could accurately detect the mutations in Horizon's reference standards using different NGS platforms. The standards help evaluate sensitivity, specificity, and limits of detection on NGS assays.
Stephen Friend Institute of Development, Aging and Cancer 2011-11-29Sage Base
The document proposes a new approach called Arch2POCM for drug development that moves from disease targets to clinical validation. It discusses issues with the current drug discovery process, noting $200 billion is spent annually but only a handful of new medicines are approved each year while productivity is declining. Arch2POCM would require a more data-driven and collaborative approach involving scientists, clinicians, and citizens to better link knowledge and accelerate eliminating human disease. It presents the mission of Sage Bionetworks to create a commons for evolving integrative networks to map diseases and enable discovery.
Hao Liu has over 15 years of experience in drug discovery research. He has developed biochemical and cell-based assays to screen for oncology and metabolic disease targets. He is skilled in developing and optimizing high throughput screening assays, cell signaling studies, and molecular biology techniques. Liu has authored several publications and presented at conferences.
This document discusses next-generation sequencing and its applications in genomics and pathology. It begins with an overview of common NGS terms and technologies. It then covers the typical NGS analysis workflow including quality control, mapping reads to a reference genome, variant calling and annotation. Challenges such as data storage, sharing and reporting are also addressed. The document concludes that clinical sequencing is becoming established but requires ongoing collaboration between pathologists, geneticists and bioinformaticians to realize its potential.
Next generation sequencing (NGS) has various applications in cancer treatment and research. It can be used to identify novel cancer mutations, detect hereditary cancer syndromes, enable personalized cancer treatment based on a patient's genetic profile, and detect circulating tumor DNA (ctDNA). NGS allows comprehensive analysis of cancer genomes and biomarkers for molecular diagnosis, prognosis, and monitoring treatment response. Challenges include analyzing large amounts of NGS data and accurately interpreting genetic variations, but its clinical utility continues to advance personalized cancer care.
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...Nathan Olson
"Next Generation Sequencing for Identification and Subtyping of Foodborne Pathogens" presentation at the Standards for Pathogen Identification via NGS (SPIN) workshop hosted by the National Institute for Standards and Technology October 2014 by Rebecca Lindsey, PhD from Enteric Diseases Laboratory Branch of the CDC.
FDA NGS and Big Data Conference September 2014Warren Kibbe
The document discusses the National Cancer Institute's efforts to address challenges in cancer data access and analysis through the development of the NCI Genomics Data Commons and NCI Cloud Pilots. The NCI Genomics Data Commons will provide integrated genomic and clinical cancer data from projects like TCGA to researchers. The NCI Cloud Pilots aim to explore cloud-based models for analyzing large cancer genomics datasets without having to download the full datasets locally, helping to enable more widespread data access and analysis. The goal is to build a national learning health system for cancer clinical genomics through open data sharing and cloud-based approaches.
Steve Rozen's keynote talk at IEEE CIBCB 2016
Big Genome Data Sheds Light on Cancer Causes
Steven G. Rozen, PhD
Professor, Cancer & Stem Cell Programme, Duke-NUS Medical School, Singapore
Director, Duke-NUS Centre for Computational Biology
The last eight years have see a revolution in the availability of DNA sequencing data. This revolution has been driven by costs that have plummeted from US$ 10 million per human genome in 2008 to US $1,200 today. Abundant sequencing data brings with it a previously unimaginable range of research possibilities in all areas of biomedical research. Naturally, these research possibilities make heavy demands on computation and data storage, because costs of sequencing are falling much faster than Moore's law. In this talk I will present a high level overview of these computational demands. I will then go into detail on a few of the cancer-related big data projects my lab is working on. One of these is "mutation signature analysis", which has important applications in cancer prevention and epidemiology and in research into the fundamental processes by which cancers arise. One example of the importance of this approach is the recent finding that a highly mutagenic herbal remedy is implicated in many more geographical regions and types of cancer than suspected a few years ago.
Presentation for teaching faculty about resources, data, issues, and strategies for including personal genomics in the classroom, within the context of precision medicine as an overarching theme.
Learning, Training, Classification, Common Sense and Exascale ComputingJoel Saltz
In this talk, I will describe work my group has carried out in development of deep learning methods that target semantic segmentation and object identification tasks in terapixel Pathology datasets and for satellite data. I will describe what we have been able to achieve, how this work can generalize to additional types of problems and will outline how exascale computing could be used to transform and integrate our methods and pipelines. I will then go on to outline broad research program in exascale computing and deep learning that promises to identify common deep learning methods for previously disparate large and extreme scale data tasks.
The Application of Next Generation Sequencing (NGS) in cancer treatmentPremadarshini Sai
Next-generation sequencing (NGS) has several advantages for cancer treatment including high throughput sequencing, screening of multiple genes simultaneously, and decreased costs. NGS faces challenges from complex data analysis and validation of new technologies. Key clinical applications of NGS include whole genome sequencing, transcriptome analysis via RNA-seq, and sequencing of cell-free DNA. Future areas of development include immunotherapy, epigenetics research, and using circulating tumor cells to detect early relapse. More research is still needed to fully realize the potential of NGS in personalized cancer treatment.
- The document discusses the Total Cancer Care (TCC) approach at Moffitt Cancer Center, which aims to provide personalized cancer care through comprehensive data collection and analysis.
- TCC collects extensive clinical, genomic, treatment and outcomes data from over 78,000 consented patients to power research studies and clinical trials matching. Molecular profiling has been conducted on over 14,000 tumor samples.
- The TCC data is housed in a large integrated database and used by researchers for studies in areas like radiochemotherapy response, exome sequencing, immunology biomarkers, and cancer epidemiology.
- The database also helps clinicians identify eligible patients for clinical trials and develop evidence-based treatment pathways. The goal is to transform cancer
This document discusses Moffitt Cancer Center's Total Cancer Care program which aims to transform cancer care through a personalized approach. It involves collecting extensive clinical, molecular, and biospecimen data from patients over their lifetime to power research. The goals are to improve outcomes through early detection, personalized treatment, and clinical trials matching. Moffitt has established an extensive biorepository and informatics platform to integrate data from over 78,000 consented patients to enable precision oncology research.
Integrative Everything, Deep Learning and Streaming DataJoel Saltz
Workshop on Clusters, Clouds, and Data for Scientific Computing, September 6, 2018
The need to create to label information and segment regions in individual sensor data sources and to create synthesizes from multiple disparate data sources span many areas of science, biomedicine and technology. The rapid evolution in sensor technologies – from digital microscopes to UAVs drive requirements in this area. I will describe a variety of use cases, describe technical challenges as well as tools, algorithms and techniques developed by our group and collaborators.
Big Data and Genomic Medicine by Corey NislowKnome_Inc
View the webinar at: http://www.knome.com/webinar-big-data-genomic-medicine. This presentation covers an overview of genomic medicine, requirements and challenges of next-generation sequencing, bottlenecks to broader healthcare adoption, and why “we want to sequence everyone.”
The Global Micorbial Identifier (GMI) initiative - and its working groupsExternalEvents
http://www.fao.org/about/meetings/wgs-on-food-safety-management/en/
The GMI initiative - and its working groups. Presentation from the Technical Meeting on the impact of Whole Genome Sequencing (WGS) on food safety management -23-25 May 2016, Rome, Italy.
The document outlines plans to transition the cBioPortal cancer genomics platform to an open source model with coordinated development between Memorial Sloan Kettering Cancer Center, Dana-Farber Cancer Institute, and Princess Margaret Cancer Centre. It discusses expanding usage, new features, funding options, and establishing an advisory committee. The goal is to build a sustainable open source community through collaborative development, additional funding, and engagement with users and potential contributors.
Next generation sequencing in cancer treatment MarliaGan
Next-generation sequencing (NGS) provides several advantages over first generation sequencing methods. NGS allows large amounts of genomic information to be sequenced in parallel at a lower cost. NGS has various applications in cancer treatment including predicting cancer progression, identifying drug targets and resistance mutations, detecting minimal residual disease, and improving cancer classification. While powerful, NGS also faces limitations such as tissue heterogeneity, complexity of data analysis, and difficulties identifying driver mutations as cancers evolve with treatment.
Additional value of prenatal genomic array testing in fetuses with isolated structural ultrasound abnormalities and a normal karyotype: a systematic review of the literature
M.C. de Wit, M.I. Srebniak, L.C.P. Govaerts, D. Van Opstal, R.J.H. Galjaard and A.T.J.I. Go
Link to free access article: http://onlinelibrary.wiley.com/doi/10.1002/uog.12575/abstract
HDx™ Reference Standards and Reference Materials for Next Generation Sequenci...Candy Smellie
This document summarizes a presentation about reference standards for next generation sequencing (NGS). Horizon Diagnostics has developed genomic DNA and formalin-fixed, paraffin-embedded (FFPE) reference standards containing defined mutations at known allelic frequencies to validate NGS workflows and monitor assay performance. Multiplex reference standards contain up to 40 mutations at low allelic frequencies down to 1.3% that can be quantified using digital PCR. Several laboratories demonstrated they could accurately detect the mutations in Horizon's reference standards using different NGS platforms. The standards help evaluate sensitivity, specificity, and limits of detection on NGS assays.
Stephen Friend Institute of Development, Aging and Cancer 2011-11-29Sage Base
The document proposes a new approach called Arch2POCM for drug development that moves from disease targets to clinical validation. It discusses issues with the current drug discovery process, noting $200 billion is spent annually but only a handful of new medicines are approved each year while productivity is declining. Arch2POCM would require a more data-driven and collaborative approach involving scientists, clinicians, and citizens to better link knowledge and accelerate eliminating human disease. It presents the mission of Sage Bionetworks to create a commons for evolving integrative networks to map diseases and enable discovery.
Hao Liu has over 15 years of experience in drug discovery research. He has developed biochemical and cell-based assays to screen for oncology and metabolic disease targets. He is skilled in developing and optimizing high throughput screening assays, cell signaling studies, and molecular biology techniques. Liu has authored several publications and presented at conferences.
This document discusses next-generation sequencing and its applications in genomics and pathology. It begins with an overview of common NGS terms and technologies. It then covers the typical NGS analysis workflow including quality control, mapping reads to a reference genome, variant calling and annotation. Challenges such as data storage, sharing and reporting are also addressed. The document concludes that clinical sequencing is becoming established but requires ongoing collaboration between pathologists, geneticists and bioinformaticians to realize its potential.
Next generation sequencing (NGS) has various applications in cancer treatment and research. It can be used to identify novel cancer mutations, detect hereditary cancer syndromes, enable personalized cancer treatment based on a patient's genetic profile, and detect circulating tumor DNA (ctDNA). NGS allows comprehensive analysis of cancer genomes and biomarkers for molecular diagnosis, prognosis, and monitoring treatment response. Challenges include analyzing large amounts of NGS data and accurately interpreting genetic variations, but its clinical utility continues to advance personalized cancer care.
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...Nathan Olson
"Next Generation Sequencing for Identification and Subtyping of Foodborne Pathogens" presentation at the Standards for Pathogen Identification via NGS (SPIN) workshop hosted by the National Institute for Standards and Technology October 2014 by Rebecca Lindsey, PhD from Enteric Diseases Laboratory Branch of the CDC.
FDA NGS and Big Data Conference September 2014Warren Kibbe
The document discusses the National Cancer Institute's efforts to address challenges in cancer data access and analysis through the development of the NCI Genomics Data Commons and NCI Cloud Pilots. The NCI Genomics Data Commons will provide integrated genomic and clinical cancer data from projects like TCGA to researchers. The NCI Cloud Pilots aim to explore cloud-based models for analyzing large cancer genomics datasets without having to download the full datasets locally, helping to enable more widespread data access and analysis. The goal is to build a national learning health system for cancer clinical genomics through open data sharing and cloud-based approaches.
Federal Research & Development for the Florida system Sept 2014 Warren Kibbe
This document discusses challenges in cancer data integration and analysis. It proposes the development of open science models, standardized data elements, and sustainable informatics infrastructure. Emerging technologies like mobile devices, social media, and cloud computing create opportunities to build a national "learning health system" for cancer. The National Cancer Institute is pursuing initiatives like the Cancer Genomics Data Commons and cloud pilots to leverage large genomic and clinical datasets using these technologies and develop predictive models to improve outcomes. The ultimate goal is a system that facilitates data sharing, continuous learning from all cancer patients, and personalized, predictive oncology.
EBI Industry programme TCGA Warren KIbbe November 2013Warren Kibbe
This document discusses strategic objectives and activities of the National Cancer Institute's Center for Biomedical Informatics and Information Technology (NCI CBIIT). The key objectives are to reduce cancer risk, improve cancer outcomes, provide cancer information to the public, and enable precision oncology through data access and modeling. Specific activities mentioned include the Genomic Data Commons, cloud computing initiatives, clinical trials repositories, and the Tumor Genome Atlas project. TCGA has collected over 700 terabytes of genomic and clinical data on 20+ cancer types to date. The data provides a platform for understanding cancer drivers, molecular subtypes of cancers, and the implications of data sharing policies.
- National challenges in cancer research include lowering barriers to data access and analysis, and integrating clinical and basic research data to enable improved outcomes.
- Disruptive technologies like high-throughput biology and ubiquitous computing are generating large amounts of molecular and clinical cancer data.
- The NCI is working to build infrastructure like the Genomics Data Commons and Cloud Pilots to make these data widely accessible and support data analysis.
- The goal is to develop a national "learning health system" that applies insights from real-world cancer data to research and clinical practice to continuously improve patient care and outcomes.
NCI Cancer Genomics, Open Science and PMI: FAIR Warren Kibbe
Talk given to the NLM Fellows on July 8, 2016. Touches on Cancer Genomics, Open Science and PMI: FAIR in NCI genomics thinking and projects. Includes discussion of the Genomic Data Commons (GDC), Cancer Data Ecosystem, Data sharing, and the NCI cancer clinical trials open API.
Proteomics Modules designed to bring clinically relevant data, at any point, into the Drug Discovery Process. 1000s of proteins are plated from primary cells and are used to trap autoantibodies from diseased patients' blood sera. Results put a spotlight on highest probability targets.
2016 Data Commons and Data Science Workshop June 7th and June 8th 2016. Genomic Data Commons, FAIR, NCI and making data more findable, publicly accessible, interoperable (machine readable), reusable and support recognition and attribution
Workshop finding and accessing data - fiona - lunteren april 18 2016Fiona Nielsen
Workshop presentation on finding and accessing human genomics data for research.
Including statistics of publicly available data sources and tips on how to save time in your workflow of data access.
Presented at BioSB2016, pre-conference PhD retreat for young researchers in bioinformatics and systems biology at Congrescentrum De Werelt in Lunteren. #BioSB2016 #BioSB16
Link to event:
http://www.youngcb.nl/events/biosb-phd-retreat-2016/
Read more about my work:
http://DNAdigest.org
http://repositive.io
https://uk.linkedin.com/in/fionanielsen
Genomic epidemiology uses whole genome sequencing data from pathogens combined with epidemiological investigations to track the spread of infectious diseases. The document discusses making genomic epidemiology a widespread reality in public health. It outlines key requirements including building a user-friendly analysis platform, developing portable analysis pipelines, providing training to public health personnel, and improving information sharing between organizations.
How Can We Make Genomic Epidemiology a Widespread Reality? - William HsiaoWilliam Hsiao
The document discusses genomic epidemiology and the requirements to bring genomic sequencing into routine public health practice. It outlines two parts: (1) what genomic epidemiology is and why it is important; and (2) the requirements for genomic sequencing to be used routinely in public health. Whole genome sequencing is seen as a way to generate high quality pathogen genomes quickly and allow for more detailed tracking of disease spread compared to traditional methods. However, bringing genomic sequencing into public health practice requires overcoming barriers such as the need for user-friendly analysis platforms, training public health personnel in genomics, and improving information sharing between organizations.
Will Biomedical Research Fundamentally Change in the Era of Big Data?Philip Bourne
This document discusses how biomedical research may fundamentally change in the era of big data. It notes that biomedical research has always been data-driven, but the scope, variety, complexity and volume of data is now much greater. It also discusses the need for more open data sharing and new tools and methods for large-scale analysis. The document suggests biomedical research may move towards a more collaborative "platform" model, as seen with companies like Airbnb, with the goal of improving data access, reuse and reproducibility of research. However, overcoming challenges like incentives, trust and work practices will be important for any new platform to succeed.
Data-integration platform for cancer research:cBioPortal demoCORBEL
Participants will be introduced to the data-integration platform cBioPortal. Here, different sources of research data (clinical, imaging, biosample and experimental) of a study are integrated, enabling viewing, querying and analysis.
This webinar is aimed at data managers, researchers, PhD students and postdocs involved in clinical, translational and biomedical research.
Improvements in sequencing technologies have led to a deluge of genomics data in many fields of research. Specifically, the increasing size of cancer-related genomics datasets require comprehensive software solutions that remain accessible to clinical researchers. Clearly, there is an obvious need for tools that integrate genomics and other molecular biology results with the phenotypic and clinical outcome data. During this webinar, the cBio Cancer Genomics Portal (cBioPortal) will be introduced through a practical use case.
The cBioPortal is an open source data integration platform that enables researchers to view, query, analyse and share complex genomic cancer datasets in a user-friendly manner. The platform was originally developed by Memorial Sloan Kettering Cancer Center (New York, USA)1 and is actively maintained and further developed by an international community. The original instance of cBioPortal (http://cbioportal.org) currently provides access to data from almost 83000 tumor samples from 273 public studies.
The demo will include:
· short introduction on the FAIR principles (Findable, Accessible, Interoperable, Reusable)
· navigation through a public study on the data-integration platform cBioPortal
· recreation of select plots from publications of interest using cBioPortal functionalities
The CORBEL webinar series aims to address challenges and share best practice between biological and medical research infrastructures. The series is aimed at technical operators of RIs and is aligned with the CORBEL competency framework.
Clinical trial data wants to be free: Lessons from the ImmPort Immunology Dat...Barry Smith
Presentation to the Clinical and Research Ethics Seminar, Clinical and Translational Science Center, Buffalo, January 21, 2014
https://immport.niaid.nih.gov/
http://youtu.be/booqxkpvJMg
Cancer Moonshot, Data sharing and the Genomic Data CommonsWarren Kibbe
Gave the inaugural Informatics Grand Rounds at City of Hope on September 8th. NIH Commons, Genomic Data Commons, NCI Cloud Pilots, Cancer Moonshot and rationale for changing incentives around data sharing all discussed.
A Vision for a Cancer Research Knowledge SystemWarren Kibbe
The document discusses a vision for a cancer research knowledge system that utilizes data commons and cloud platforms. It describes how data commons co-locate data, storage, computing and tools to create interoperable resources for researchers. The Genomic Data Commons aims to make over 30,000 cancer cases FAIR (Findable, Accessible, Interoperable, Reusable) and provide attribution. This will help identify rare cancer drivers and factors influencing therapy response. The system incorporates multiple data types from studies and clinical trials to enable precision medicine approaches.
Advancing Innovation and Convergence in Cancer Research: US Federal Cancer Mo...Jerry Lee
Special Seminar at the 8th Taiwan Biosignatures Workshop to share overall work of NCI's Center for Strategic Scientific Initiatives since 2003 as well as CSSI's influence on select projects initiated by the 2016 WH Cancer Moonshot Task Force that include Applied Proteogenomics Organizational Learning and Outcomes (APOLLO) network, International Cancer Proteogenome Consortium, and the Blood Profiling Atlas in Cancer (BloodPAC) commons.
International perspective for sharing publicly funded medical research dataARDC
Presentation by Olivier Salvado, CSIRO, to the 'Unlocking value from publicly funded Clinical Research Data' workshop, cohosted by ARDC and CSIRO at ANU on 6 March 2019.
Similar to Nov 2014 ouellette_windsor_icgc_final (20)
PPT on Direct Seeded Rice presented at the three-day 'Training and Validation Workshop on Modules of Climate Smart Agriculture (CSA) Technologies in South Asia' workshop on April 22, 2024.
When I was asked to give a companion lecture in support of ‘The Philosophy of Science’ (https://shorturl.at/4pUXz) I decided not to walk through the detail of the many methodologies in order of use. Instead, I chose to employ a long standing, and ongoing, scientific development as an exemplar. And so, I chose the ever evolving story of Thermodynamics as a scientific investigation at its best.
Conducted over a period of >200 years, Thermodynamics R&D, and application, benefitted from the highest levels of professionalism, collaboration, and technical thoroughness. New layers of application, methodology, and practice were made possible by the progressive advance of technology. In turn, this has seen measurement and modelling accuracy continually improved at a micro and macro level.
Perhaps most importantly, Thermodynamics rapidly became a primary tool in the advance of applied science/engineering/technology, spanning micro-tech, to aerospace and cosmology. I can think of no better a story to illustrate the breadth of scientific methodologies and applications at their best.
Authoring a personal GPT for your research and practice: How we created the Q...Leonel Morgado
Thematic analysis in qualitative research is a time-consuming and systematic task, typically done using teams. Team members must ground their activities on common understandings of the major concepts underlying the thematic analysis, and define criteria for its development. However, conceptual misunderstandings, equivocations, and lack of adherence to criteria are challenges to the quality and speed of this process. Given the distributed and uncertain nature of this process, we wondered if the tasks in thematic analysis could be supported by readily available artificial intelligence chatbots. Our early efforts point to potential benefits: not just saving time in the coding process but better adherence to criteria and grounding, by increasing triangulation between humans and artificial intelligence. This tutorial will provide a description and demonstration of the process we followed, as two academic researchers, to develop a custom ChatGPT to assist with qualitative coding in the thematic data analysis process of immersive learning accounts in a survey of the academic literature: QUAL-E Immersive Learning Thematic Analysis Helper. In the hands-on time, participants will try out QUAL-E and develop their ideas for their own qualitative coding ChatGPT. Participants that have the paid ChatGPT Plus subscription can create a draft of their assistants. The organizers will provide course materials and slide deck that participants will be able to utilize to continue development of their custom GPT. The paid subscription to ChatGPT Plus is not required to participate in this workshop, just for trying out personal GPTs during it.
Immersive Learning That Works: Research Grounding and Paths ForwardLeonel Morgado
We will metaverse into the essence of immersive learning, into its three dimensions and conceptual models. This approach encompasses elements from teaching methodologies to social involvement, through organizational concerns and technologies. Challenging the perception of learning as knowledge transfer, we introduce a 'Uses, Practices & Strategies' model operationalized by the 'Immersive Learning Brain' and ‘Immersion Cube’ frameworks. This approach offers a comprehensive guide through the intricacies of immersive educational experiences and spotlighting research frontiers, along the immersion dimensions of system, narrative, and agency. Our discourse extends to stakeholders beyond the academic sphere, addressing the interests of technologists, instructional designers, and policymakers. We span various contexts, from formal education to organizational transformation to the new horizon of an AI-pervasive society. This keynote aims to unite the iLRN community in a collaborative journey towards a future where immersive learning research and practice coalesce, paving the way for innovative educational research and practice landscapes.
Current Ms word generated power point presentation covers major details about the micronuclei test. It's significance and assays to conduct it. It is used to detect the micronuclei formation inside the cells of nearly every multicellular organism. It's formation takes place during chromosomal sepration at metaphase.
The binding of cosmological structures by massless topological defectsSérgio Sacani
Assuming spherical symmetry and weak field, it is shown that if one solves the Poisson equation or the Einstein field
equations sourced by a topological defect, i.e. a singularity of a very specific form, the result is a localized gravitational
field capable of driving flat rotation (i.e. Keplerian circular orbits at a constant speed for all radii) of test masses on a thin
spherical shell without any underlying mass. Moreover, a large-scale structure which exploits this solution by assembling
concentrically a number of such topological defects can establish a flat stellar or galactic rotation curve, and can also deflect
light in the same manner as an equipotential (isothermal) sphere. Thus, the need for dark matter or modified gravity theory is
mitigated, at least in part.
The debris of the ‘last major merger’ is dynamically youngSérgio Sacani
The Milky Way’s (MW) inner stellar halo contains an [Fe/H]-rich component with highly eccentric orbits, often referred to as the
‘last major merger.’ Hypotheses for the origin of this component include Gaia-Sausage/Enceladus (GSE), where the progenitor
collided with the MW proto-disc 8–11 Gyr ago, and the Virgo Radial Merger (VRM), where the progenitor collided with the
MW disc within the last 3 Gyr. These two scenarios make different predictions about observable structure in local phase space,
because the morphology of debris depends on how long it has had to phase mix. The recently identified phase-space folds in Gaia
DR3 have positive caustic velocities, making them fundamentally different than the phase-mixed chevrons found in simulations
at late times. Roughly 20 per cent of the stars in the prograde local stellar halo are associated with the observed caustics. Based
on a simple phase-mixing model, the observed number of caustics are consistent with a merger that occurred 1–2 Gyr ago.
We also compare the observed phase-space distribution to FIRE-2 Latte simulations of GSE-like mergers, using a quantitative
measurement of phase mixing (2D causticality). The observed local phase-space distribution best matches the simulated data
1–2 Gyr after collision, and certainly not later than 3 Gyr. This is further evidence that the progenitor of the ‘last major merger’
did not collide with the MW proto-disc at early times, as is thought for the GSE, but instead collided with the MW disc within
the last few Gyr, consistent with the body of work surrounding the VRM.
hematic appreciation test is a psychological assessment tool used to measure an individual's appreciation and understanding of specific themes or topics. This test helps to evaluate an individual's ability to connect different ideas and concepts within a given theme, as well as their overall comprehension and interpretation skills. The results of the test can provide valuable insights into an individual's cognitive abilities, creativity, and critical thinking skills
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...Sérgio Sacani
Context. With a mass exceeding several 104 M⊙ and a rich and dense population of massive stars, supermassive young star clusters
represent the most massive star-forming environment that is dominated by the feedback from massive stars and gravitational interactions
among stars.
Aims. In this paper we present the Extended Westerlund 1 and 2 Open Clusters Survey (EWOCS) project, which aims to investigate
the influence of the starburst environment on the formation of stars and planets, and on the evolution of both low and high mass stars.
The primary targets of this project are Westerlund 1 and 2, the closest supermassive star clusters to the Sun.
Methods. The project is based primarily on recent observations conducted with the Chandra and JWST observatories. Specifically,
the Chandra survey of Westerlund 1 consists of 36 new ACIS-I observations, nearly co-pointed, for a total exposure time of 1 Msec.
Additionally, we included 8 archival Chandra/ACIS-S observations. This paper presents the resulting catalog of X-ray sources within
and around Westerlund 1. Sources were detected by combining various existing methods, and photon extraction and source validation
were carried out using the ACIS-Extract software.
Results. The EWOCS X-ray catalog comprises 5963 validated sources out of the 9420 initially provided to ACIS-Extract, reaching a
photon flux threshold of approximately 2 × 10−8 photons cm−2
s
−1
. The X-ray sources exhibit a highly concentrated spatial distribution,
with 1075 sources located within the central 1 arcmin. We have successfully detected X-ray emissions from 126 out of the 166 known
massive stars of the cluster, and we have collected over 71 000 photons from the magnetar CXO J164710.20-455217.
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...University of Maribor
Slides from talk:
Aleš Zamuda: Remote Sensing and Computational, Evolutionary, Supercomputing, and Intelligent Systems.
11th International Conference on Electrical, Electronics and Computer Engineering (IcETRAN), Niš, 3-6 June 2024
Inter-Society Networking Panel GRSS/MTT-S/CIS Panel Session: Promoting Connection and Cooperation
https://www.etran.rs/2024/en/home-english/
Phenomics assisted breeding in crop improvementIshaGoswami9
As the population is increasing and will reach about 9 billion upto 2050. Also due to climate change, it is difficult to meet the food requirement of such a large population. Facing the challenges presented by resource shortages, climate
change, and increasing global population, crop yield and quality need to be improved in a sustainable way over the coming decades. Genetic improvement by breeding is the best way to increase crop productivity. With the rapid progression of functional
genomics, an increasing number of crop genomes have been sequenced and dozens of genes influencing key agronomic traits have been identified. However, current genome sequence information has not been adequately exploited for understanding
the complex characteristics of multiple gene, owing to a lack of crop phenotypic data. Efficient, automatic, and accurate technologies and platforms that can capture phenotypic data that can
be linked to genomics information for crop improvement at all growth stages have become as important as genotyping. Thus,
high-throughput phenotyping has become the major bottleneck restricting crop breeding. Plant phenomics has been defined as the high-throughput, accurate acquisition and analysis of multi-dimensional phenotypes
during crop growing stages at the organism level, including the cell, tissue, organ, individual plant, plot, and field levels. With the rapid development of novel sensors, imaging technology,
and analysis methods, numerous infrastructure platforms have been developed for phenotyping.
aziz sancar nobel prize winner: from mardin to nobel
Nov 2014 ouellette_windsor_icgc_final
1. A project status for the International Cancer Genome
Consortium (ICGC).
November 21th 2014
B.F. Francis Ouellette francis@oicr.on.ca
• Senior Scientists & Associate Director,
Informatics and Biocomputing, Ontario Institute for
Cancer Research, Toronto, ON
• Associate Professor, Department of Cell and Systems Biology,
University of Toronto, Toronto, ON.
@bf fo on
2. 2
You are free to:
Copy, share, adapt, or re-mix;
Photograph, film, or broadcast;
Blog, live-blog, or post video of;
This presentation. Provided that:
You attribute the work to its author and respect the rights
and licenses associated with its components.
Slide Concept by Cameron Neylon, who has waived all copyright and related or neighbouring rights. This slide only ccZero.
Social Media Icons adapted with permission from originals by Christopher Ross. Original images are available under GPL at;
http://www.thisismyurl.com/free-downloads/15-free-speech-bubble-icons-for-popular-websites
3. 3
But first, a little about me …
… an unfinished story!
39. 39
Cancer
A Disease of the Genome
Challenge in Treating Cancer:
Every tumor is different
Every cancer patient is different
40. 40
Large-Scale Studies of Cancer Genomes
Johns Hopkins
> 18,000 genes analyzed for mutations
11 breast and 11 colon tumors
L.D. Wood et al, Science, Oct. 2007
Wellcome Trust Sanger Institute
518 genes analyzed for mutations
210 tumors of various types
C. Greenman et al, Nature, Mar. 2007
TCGA (NIH)
Multiple technologies
brain (glioblastoma multiforme), lung (squamous carcinoma),
and ovarian (serous cystadenocarcinoma).
F.S. Collins & A.D. Barker, Sci. Am, Mar. 2007
41. 41
Lessons learned
Heterogeneity within and across tumor types
High rate of abnormalities (driver vs
passenger)
Sample quality matters
Consent and controlled data access is
complicated
42. 42
International Cancer Genome Consortium
• Collect ~500 tumour/normal pairs from each of 50 different major
cancer types;
• Comprehensive genome analysis of each T/N pair:
– Genome
– Transcriptome
– Methylome
– Clinical data
• Make the data available to the research community & public.
Identify
genome
changes
…GATTATTCCAGGTAT… …GATTATTGCAGGTAT… …GATTATTGCAGGTAT…
43. 43
Rationale for the ICGC
• The scope is huge, such that no country can do it all.
• Coordinated cancer genome initiatives will reduce
duplication of effort for common and easy to acquire
tumor samples and and ensure complete studies for many
less frequent forms of cancer.
• Standardization and uniform quality measures across
studies will enable the merging of datasets, increasing
power to detect additional targets.
• The spectrum of many cancers varies across the
world for many tumor types, because of environmental,
genetic and other causes.
• The ICGC will accelerate the dissemination of genomic
and analytical methods across participating sites, and
the user community
44. 44
International Cancer Genome Consortium
(ICGC)
Goals
• Catalogue genomic abnormalities in tumors in 50
different cancer types and/or subtypes of clinical and
societal importance across the globe
• Generate complementary catalogues of transcriptomic
and epigenomic datasets from the same tumors
• Make the data available to research community rapidly
with minimal restrictions to accelerate research into the
causes and control of cancer
50 tumor types and/or subtypes
500 tumors + 500 controls per subtype
50,000 Human Genome Projects!
Nature (2010) 464:993
54. 54
ICGC Controlled
Access Datasets
• Detailed Phenotype and Outcome data
Region of residence
Risk factors
Examination
Surgery
Radiation
Sample
Slide
Specific histological features
Analyte
Aliquot
Donor notes
• Gene Expression (probe-level data)
• Raw genotype calls
• Gene-sample identifier links
• Genome sequence files
ICGC OA
Datasets
• Cancer Pathology
Histologic type or subtype
Histologic nuclear grade
• Patient/Person
Gender, Age range,
Vital status, Survival time
Relapse type, Status at follow-up
• Gene Expression (normalized)
• DNA methylation
•Computed Copy Number and
Loss of Heterozygosity
• Newly discovered somatic variants
http://goo.gl/w4mrV
55. 55
Secondary Goal: coordinate
work to benefit productivity
http://goo.gl/K5mHC3
59. 59
Policy
ICGC membership implies compliance with Core
Bioethical Elements for samples used in ICGC
Cancer Projects:
http://goo.gl/TFrCmK
http://goo.gl/nYx6YG
60. 60
POLICY:
The members of the International Cancer Genomics
Consortium (ICGC) are committed to the principle of
rapid data release to the scientific community.
http://goo.gl/TFrCmK
61. 61
Publication Policy
• The individual research groups in
the ICGC are free to publish the
results of their own efforts in
independent publications at any
time (subject, of course, to any
policies of any collaborations in
which they may be participating).
64. 64
Where do you find that information?
• We actually make it hard to find, but we are
working on that! (this is an example of where ICGC
would like to do what TCGA does!)
• http://cancergenome.nih.gov/publications/publicatio
nguidelines
65. 65
Where do you find that information?
For ICGC data:
• Need to find the policy!
• http://icgc.org/icgc/goals-structure-policies-guidelines/
e3-publication-policy
• Find text:
• Find date: in README on FTP file
• This is bad, we know it, and we are fixing it!
• In doubt, contact us: info@icgc.org
66. 66
Policy on Intellectual Property
• All ICGC members agree not to make claims to
possible IP derived from primary data (including
somatic mutations) and to not pursue IP
protections that would prevent or block access to
or use of any element of ICGC data or conclusions
drawn directly from those data.
http://goo.gl/TCMXCl
68. 68
DCC Activities
DCC activities are split between two groups:
• Software Development
– DCC portal
– Submission tool
• Biocuration (which also includes Content
Management)
– Data level management
– Submitter “handling”
– Coordination with secretariat
– User support
http://dcc.icgc.org/team
68
72. 72
ICGC Biocuration
• Helping submitters get their data to ICGC
• Progress reporting (data audit)
• Quality checks (coverage, correctness, etc.)
• Helping users get to the data
• Validate and check (and recheck) metadata on public
repositories
• Test and integrate with other public repositories via
standard data formats, ontologies.
• Documentation, documentation, and more documentation
• Training
72
73. 73
ICGC datasets to date
ICGC Data Portal Cumulative Donor Count for Member Projects
14,000
12,000
10,000
8000
6000
4000
2000
0
Number
of
Donors
Release 7
Release 8
Release 10
Release 9
Release 11
Release 12
Release 14
Release 13
Release 15
Release 16
Release 17
75. 75
Clinical Data Completeness
Overall Donor Clinical Data Completeness
Donor Tumour stage at diagnosis supplemental
Donor relapse type
Donor relapse interval
Donor Tumour stage at diagnosis
Donor Tumour staging system at diagnosis
Donor vital status
Donor region of residence
Disease status last followup
Donor interval of last followup
Donor diagnosis ICG10
Donor
Fields
Donor survival time
Donor age at last followup
Donor age at diagnosis
Donor sex
Donor ID
Average Percentage Completeness
76. 76
Clinical Data Completeness
Overall Donor Clinical Data Completeness
Donor Tumour stage at diagnosis supplemental
Donor relapse type
Donor relapse interval
Donor Tumour stage at diagnosis
Donor Tumour staging system at diagnosis
Donor vital status
Donor region of residence
Disease status last followup
Donor interval of last followup
Donor diagnosis ICG10
Donor
Fields
Donor survival time
Donor age at last followup
Donor age at diagnosis
Donor sex
Donor ID
Average Percentage Completeness
77. 77
Clinical Data Completeness
Overall Specimen Clinical Data Completeness
Specimen Biobank ID
Specimen donor treatment type other
Specimen Biobank
Percentage cellularity
Tumour Stage Supplemental
Tumour Grade Supplemental
Level of cellularity
Tumour Grade
Specimen type other
Tumour Stage
Tumour Grading System
Tumour Stage System
Digital Image of Stained Section
Specimen available
Tumour Histological Type
Specimen storage
Specimen processing
Specimen Interval
Specimen donor treatment type
Specimen processing other
Tumour confirmed
Specimen storage other
Specimen type
Specimen ID
Donor ID
Specimen
Fields
0 20 40 60 80
10 30 50 70 90 100
Average Percentage Completeness
78. 78
Clinical Data Completeness
Overall Specimen Clinical Data Completeness
Specimen Biobank ID
Specimen donor treatment type other
Specimen Biobank
Percentage cellularity
Tumour Stage Supplemental
Tumour Grade Supplemental
Level of cellularity
Tumour Grade
Specimen type other
Tumour Stage
Tumour Grading System
Tumour Stage System
Digital Image of Stained Section
Specimen available
Tumour Histological Type
Specimen storage
Specimen processing
Specimen Interval
Specimen donor treatment type
Specimen processing other
Tumour confirmed
Specimen storage other
Specimen type
Specimen ID
Donor ID
Specimen
Fields
0 20 40 60 80
10 30 50 70 90 100
Average Percentage Completeness
79. 79
DACO
ICGC
cgHUB
EGA
TCGA
BAM
Open
Open
BA
M
Germ
Line
+ EGA id
BA
M
BA
M
ERA
80. ICGC
BAM/FASTQ
TCGA
BAM/FASTQ
ICGC
Open
Data
(includes
TCGA
Open Data)
COSMIC
Open
Data
81. 81
Raw Data Availability at EGA by Project and Data Type
• https://www.ebi.ac.uk/ega/organisations/EGAO00000000024
92. 92
Highlights of the new portal: dcc.icgc.org
• Faceted searches capabilities for variants, genes and
donors
– Interactive data exploration fast and easy
• Mutation aggregation & counts across donors and cancers
– # of pancreatic cancers donors with mutation KRAS G12D
• Standardized gene consequence across all projects
• Genome browser
• Data doewnload
• Protein domains
• Links to repositories
94. 94
• Summary
• Cancer type distribution
• Other links (Cosmic, Entrez, etc)
• Mutation profile in protein
• Domains
• Genomic Context
• Mutation profile
• Most common mutations
99. 99
Donor
• Donor ID
• Primary site
• Cancer Project
• Gender
• Tumor Stage
• Vital Status
• Disease Status
• Release type
• Age at diagnosis
• Available data types
• Analysis types
107. 107
BIG
DATA
Validation
ValidRaAtiWon
DATA
Meta
DATA
Interpreted
data
✔
✔
✔
✔
✔
108. 108
DACO
ICGC
dbGaP
EGA
TCGA
BAM
Open
Open
ERA
BA
M
Germ
Line
+ EGA id
BA
M
BA
M
109. 109
ICGC Data Categories
ICGC Open Access Datasets ICGC Controlled Access Datasets
Cancer Pathology
Histologic type or subtype
Histologic nuclear grade
Donor
Gender
Age range
RNA expression (normalized)
DNA methylation
Genotype frequencies
Somatic mutations (SNV,
CNV and Structural
Rearrangement)
Detailed Phenotype and Outcome Data
Patient demography
Risk factors
Examination
Surgery/Drugs/Radiation
Sample/Slide
Specific histological features
Protocol
Analyte/Aliquot
Gene Expression (probe-level data)
Raw genotype calls (germline)
Gene-sample identifier links
Genome sequence files
Most of the data in the portal is publically available without restriction. However,
access to some data, like the germline mutations, requires authorization by the Data
Access Compliance Office (DACO)
112. 112
ICGC Controlled
Access Datasets
• Detailed Phenotype and Outcome data
Region of residence
Risk factors
Examination
Surgery
Radiation
Sample
Slide
Specific histological features
Analyte
Aliquot
Donor notes
• Gene Expression (probe-level data)
• Raw genotype calls
• Gene-sample identifier links
• Genome sequence files
ICGC OA
Datasets
• Cancer Pathology
Histologic type or subtype
Histologic nuclear grade
• Patient/Person
Gender, Age range,
Vital status, Survival time
Relapse type, Status at follow-up
• Gene Expression (normalized)
• DNA methylation
•Computed Copy Number and
Loss of Heterozygosity
• Newly discovered somatic variants
http://goo.gl/w4mrV
113. Identify
yourself
Fill out detail form which
includes:
• Contact and Project
Information
•Information Technology
details and procedures
for keeping data secure
•Data Access Agreement
All of these
documents are
put into a PDF
file that you
print and get your
institution to sign
off on your behalf
121. 121
121
Nature 409:452
Bioinformatics Citizenship: What it means,
and what does it cost?
122. 122
Important messages:
• The ICGC portal is evolving and getting better all
the time
• Lots of data provided by the ICGC
• Important to be good citizens of the scientific world
• The idea behind all of this is to provide tools to
help cure cancer
• Need to respect policies and guidelines
• There is help out there, and user feedback is
*always* welcome.
123. 123
Acknowledgments
DCC Software
Developer
Vincent Ferretti
Daniel Chang
Anthony Cros
Jerry Lam
Brian O'Connor
Bob Tiernay
Stuart Watt
Shane Wilson
Junjun Zhang
ICGC Project leaders
at the OICR:
Tom Hudson
John McPherson
Lincoln Stein
Jared Simpson
Paul Boutros
Vincent Ferretti
Francis Ouellette
Jennifer Jennings
http://oicr.on.ca http://icgc.org
Ouellette Lab
Michelle Brazas
Emilie Chautard
Nina Palikuca
Zhibin Lu
Web Dev
Joseph Yamada
Angela Chao
Daniel Gross
Kamen Wu
Kim Cullion
Miyuki Fukuma
Wen Xu
Pipeline Development
& Evaluation
Morgan Taschuk
Michael Laszloffy
Peter Ruzanov
ICGC DCC Biocuration
Hardeep Nahal
Marc Perry
Research IT/Systems
David Sutton,
Bob Gibson
Sam Maclennan
David Magda
Rob Naccarato
Brian Ott
Gino Yearwood
EGA
Justin Paschall
Jeff Almeida-King
Ilkka Lappalainen
Jordi Rambla De Argila
Marc Sitges Puy
… and all the patients and their
families that that are putting their
hopes into our work!