Describes NCI's Center for Strategic Scientific Initiatives activities (2005 - 2017) as well as data and technology activities of the 2016 White House Cancer Moonshot Task Force (2016 - 2017).
Advancing Innovation and Convergence in Cancer Research: US Federal Cancer Mo...Jerry Lee
Special Seminar at the 8th Taiwan Biosignatures Workshop to share overall work of NCI's Center for Strategic Scientific Initiatives since 2003 as well as CSSI's influence on select projects initiated by the 2016 WH Cancer Moonshot Task Force that include Applied Proteogenomics Organizational Learning and Outcomes (APOLLO) network, International Cancer Proteogenome Consortium, and the Blood Profiling Atlas in Cancer (BloodPAC) commons.
LLS Southern California Blood Cancer Conference, March 4, 2017Jerry Lee
Jerry S.H. Lee, Ph.D. is the Health Sciences Director and Deputy Director of the Center for Strategic Scientific Initiatives (CSSI) at the National Cancer Institute (NCI). He discussed advancing innovation and convergence in cancer research. Key points included:
- CSSI's role in creating exploratory programs to accelerate cancer data sharing and tool development.
- Progress made by The Cancer Genome Atlas (TCGA) in collecting and analyzing tumor samples to discover new cancer subtypes and therapeutic targets.
- Importance of data quality, standardization, and sharing enabled by initiatives like TCGA to generate new insights into cancer biology.
- Continued momentum in 2017 to leverage data from initiatives like
Advancing Convergence and Innovation in Cancer Research: Seminar at Universit...Jerry Lee
Since 2003, the National Cancer Institute’s Center for Strategic Scientific Initiatives (CSSI) has worked to develop the resources and infrastructures investigators need to surmount roadblocks in cancer research. CSSI manages programs that promote technology development and cross-disciplinary collaboration and provide support for investigators in nascent and challenging research fields. This support includes funding opportunities, shared reagent and database resources, and assistance in the development of standards and protocols. CSSI also provides a network of partners in industry and government that can help NCI-funded researchers advance their technologies toward commercialization and translation. This presentation will highlight technologies including single-cell isolation and analysis techniques that have been supported through various CSSI mechanisms from proof-of-concept to translation into the clinic.
Nci clinical genomics data sharing ncra sept 2016Warren Kibbe
The document discusses the Genomic Data Commons (GDC), an effort by the National Cancer Institute to standardize and simplify submission of genomic cancer data and make it accessible to researchers according to FAIR principles. The GDC stores raw genomic data from over 50,000 cancer cases and associated clinical information to support discovery, validation of new therapies, and precision oncology. It aims to foster data sharing, reuse, and collaboration across cancer studies to advance understanding and treatment of cancer.
Advancing Innovation and Convergence in Cancer Research: US Federal Cancer Mo...Jerry Lee
Special Seminar at the 8th Taiwan Biosignatures Workshop to share overall work of NCI's Center for Strategic Scientific Initiatives since 2003 as well as CSSI's influence on select projects initiated by the 2016 WH Cancer Moonshot Task Force that include Applied Proteogenomics Organizational Learning and Outcomes (APOLLO) network, International Cancer Proteogenome Consortium, and the Blood Profiling Atlas in Cancer (BloodPAC) commons.
LLS Southern California Blood Cancer Conference, March 4, 2017Jerry Lee
Jerry S.H. Lee, Ph.D. is the Health Sciences Director and Deputy Director of the Center for Strategic Scientific Initiatives (CSSI) at the National Cancer Institute (NCI). He discussed advancing innovation and convergence in cancer research. Key points included:
- CSSI's role in creating exploratory programs to accelerate cancer data sharing and tool development.
- Progress made by The Cancer Genome Atlas (TCGA) in collecting and analyzing tumor samples to discover new cancer subtypes and therapeutic targets.
- Importance of data quality, standardization, and sharing enabled by initiatives like TCGA to generate new insights into cancer biology.
- Continued momentum in 2017 to leverage data from initiatives like
Advancing Convergence and Innovation in Cancer Research: Seminar at Universit...Jerry Lee
Since 2003, the National Cancer Institute’s Center for Strategic Scientific Initiatives (CSSI) has worked to develop the resources and infrastructures investigators need to surmount roadblocks in cancer research. CSSI manages programs that promote technology development and cross-disciplinary collaboration and provide support for investigators in nascent and challenging research fields. This support includes funding opportunities, shared reagent and database resources, and assistance in the development of standards and protocols. CSSI also provides a network of partners in industry and government that can help NCI-funded researchers advance their technologies toward commercialization and translation. This presentation will highlight technologies including single-cell isolation and analysis techniques that have been supported through various CSSI mechanisms from proof-of-concept to translation into the clinic.
Nci clinical genomics data sharing ncra sept 2016Warren Kibbe
The document discusses the Genomic Data Commons (GDC), an effort by the National Cancer Institute to standardize and simplify submission of genomic cancer data and make it accessible to researchers according to FAIR principles. The GDC stores raw genomic data from over 50,000 cancer cases and associated clinical information to support discovery, validation of new therapies, and precision oncology. It aims to foster data sharing, reuse, and collaboration across cancer studies to advance understanding and treatment of cancer.
A Vision for a Cancer Research Knowledge SystemWarren Kibbe
The document discusses a vision for a cancer research knowledge system that utilizes data commons and cloud platforms. It describes how data commons co-locate data, storage, computing and tools to create interoperable resources for researchers. The Genomic Data Commons aims to make over 30,000 cancer cases FAIR (Findable, Accessible, Interoperable, Reusable) and provide attribution. This will help identify rare cancer drivers and factors influencing therapy response. The system incorporates multiple data types from studies and clinical trials to enable precision medicine approaches.
National Cancer Data Ecosystem and Data SharingWarren Kibbe
Grand Rounds at the Siteman Cancer Center at Washington University. Highlighting the Genomic Data Commons and the National Cancer Data Ecosystem defined by the Cancer Moonshot Blue Ribbon Panel
NCI Cancer Imaging Program - Cancer Research Data EcosystemWarren Kibbe
Given to the NCI Cancer Imaging Program monthly telecon on January 9th, 2017. NCI Genomic Data Commons, Beau Biden Cancer Moonshot Blue Ribbon Panel, Cancer Research Data Ecosystem and the role of imaging in precision medicine
NCI Cancer Genomics, Open Science and PMI: FAIR Warren Kibbe
Talk given to the NLM Fellows on July 8, 2016. Touches on Cancer Genomics, Open Science and PMI: FAIR in NCI genomics thinking and projects. Includes discussion of the Genomic Data Commons (GDC), Cancer Data Ecosystem, Data sharing, and the NCI cancer clinical trials open API.
Can SAR Database: An Overview on System, Role and Applicationinventionjournals
: The intention of this paper is to provide an technical overview on the largest cancer database, the canSAR database system. This overview includes the basic definitions and terminology, findings and advancements infield of cancer research through canSAR database, with basic system architecture, design, data source, processing pipelines, screening tests and structure activity relationship of system.
Keynote at NVIDIA GPU Technology Conference in D.C.Jerry Lee
Presentation at NVIDIA GPU Technology Conference in D.C. on how the Cancer Moonshot Task Force under Vice President Biden is using AI to help end cancer as we know it. Dr. Lee will discuss global efforts to empower A.I. and deep learning for oncology with larger and more accessible datasets.
Day 2 Big Data panel at the NIH BD2K All Hands 2016 meetingWarren Kibbe
Big data in oncology and implications for open data, open science, rapid innovation, data reuse, reproducibility and data sharing. Cancer Moonshot, Precisions Medicine Initiative (PMI), the Genomic Data Commons, NCI Cloud Pilots, NCI-DOE Pilots, and the Cancer Research Data Ecosystem.
SuperComputing 16 HPC Matters Panel on Precision MedicineWarren Kibbe
The National Cancer Institute aims to lessen the burden of cancer through scientific evidence. In 2016, there will be an estimated 1.7 million new cancer cases and 600,000 cancer deaths in the United States. Precision medicine will lead to a fundamental understanding of the complex interplay between genetics, epigenetics, nutrition, environment and clinical presentation to direct effective cancer prevention and treatment. The Cancer Moonshot initiative utilizes precision medicine, national computing initiatives, and making genomic data widely available to investigate and explore cancer prediction using real-world data.
- The Personalized OncoGenomics (POG) program at the British Columbia Cancer Agency conducted whole-genome analysis on tumors from 100 patients with advanced or incurable cancers to inform treatment decisions.
- Fresh tumor and blood samples were obtained from patients and underwent whole-genome and RNA sequencing. Computational analysis identified potential driver mutations, genes and pathways.
- A multidisciplinary team discussed genomic findings weekly and established guidelines for interpreting and communicating results to integrate them into patient care. Genomic findings were considered actionable in 55 of 78 cases that underwent whole-genome analysis, and motivated treatment changes in 23 cases.
- The experience demonstrated that a multidisciplinary team can implement an approach where whole-genome
FDA NGS and Big Data Conference September 2014Warren Kibbe
The document discusses the National Cancer Institute's efforts to address challenges in cancer data access and analysis through the development of the NCI Genomics Data Commons and NCI Cloud Pilots. The NCI Genomics Data Commons will provide integrated genomic and clinical cancer data from projects like TCGA to researchers. The NCI Cloud Pilots aim to explore cloud-based models for analyzing large cancer genomics datasets without having to download the full datasets locally, helping to enable more widespread data access and analysis. The goal is to build a national learning health system for cancer clinical genomics through open data sharing and cloud-based approaches.
- The document discusses the Total Cancer Care (TCC) approach at Moffitt Cancer Center, which aims to provide personalized cancer care through comprehensive data collection and analysis.
- TCC collects extensive clinical, genomic, treatment and outcomes data from over 78,000 consented patients to power research studies and clinical trials matching. Molecular profiling has been conducted on over 14,000 tumor samples.
- The TCC data is housed in a large integrated database and used by researchers for studies in areas like radiochemotherapy response, exome sequencing, immunology biomarkers, and cancer epidemiology.
- The database also helps clinicians identify eligible patients for clinical trials and develop evidence-based treatment pathways. The goal is to transform cancer
This document discusses Moffitt Cancer Center's Total Cancer Care program which aims to transform cancer care through a personalized approach. It involves collecting extensive clinical, molecular, and biospecimen data from patients over their lifetime to power research. The goals are to improve outcomes through early detection, personalized treatment, and clinical trials matching. Moffitt has established an extensive biorepository and informatics platform to integrate data from over 78,000 consented patients to enable precision oncology research.
Federal Research & Development for the Florida system Sept 2014 Warren Kibbe
This document discusses challenges in cancer data integration and analysis. It proposes the development of open science models, standardized data elements, and sustainable informatics infrastructure. Emerging technologies like mobile devices, social media, and cloud computing create opportunities to build a national "learning health system" for cancer. The National Cancer Institute is pursuing initiatives like the Cancer Genomics Data Commons and cloud pilots to leverage large genomic and clinical datasets using these technologies and develop predictive models to improve outcomes. The ultimate goal is a system that facilitates data sharing, continuous learning from all cancer patients, and personalized, predictive oncology.
Cancer Research Data Ecosystem - Dr. Warren Kibbeimgcommcall
The document discusses the Cancer Research Data Ecosystem and the National Cancer Data Ecosystem being developed through the Beau Biden Cancer Moonshot initiative. It notes that cancer research and care generate large amounts of detailed data that is critical to create a learning health system for cancer. It highlights efforts like the NIH Genomic Data Commons and the need for data standards to make cancer-related data more accessible, interoperable, and reusable to researchers. The goal is to maximize data sharing and reuse to advance the understanding of cancer and improve prevention and treatment outcomes.
US Federal Cancer Moonshot- One Year LaterJerry Lee
Presentation from former Cancer Moonshot Data and Technology Track Co-chairs Jerry S.H. Lee, PhD (NCI, former OVP) and Dimitri Kusnezov, PhD (DOE) to update on efforts that will help realize the Data/Tech Track's vision of a national learning healthcare system for cancer. These include NCI/DOE pilots, DOE/VA pilot, NCI GDC, DoD/VA/NCI APOLLO, NCI/GSK ATOM, and BloodPAC.
2016 Data Commons and Data Science Workshop June 7th and June 8th 2016. Genomic Data Commons, FAIR, NCI and making data more findable, publicly accessible, interoperable (machine readable), reusable and support recognition and attribution
A Vision for a Cancer Research Knowledge SystemWarren Kibbe
The document discusses a vision for a cancer research knowledge system that utilizes data commons and cloud platforms. It describes how data commons co-locate data, storage, computing and tools to create interoperable resources for researchers. The Genomic Data Commons aims to make over 30,000 cancer cases FAIR (Findable, Accessible, Interoperable, Reusable) and provide attribution. This will help identify rare cancer drivers and factors influencing therapy response. The system incorporates multiple data types from studies and clinical trials to enable precision medicine approaches.
National Cancer Data Ecosystem and Data SharingWarren Kibbe
Grand Rounds at the Siteman Cancer Center at Washington University. Highlighting the Genomic Data Commons and the National Cancer Data Ecosystem defined by the Cancer Moonshot Blue Ribbon Panel
NCI Cancer Imaging Program - Cancer Research Data EcosystemWarren Kibbe
Given to the NCI Cancer Imaging Program monthly telecon on January 9th, 2017. NCI Genomic Data Commons, Beau Biden Cancer Moonshot Blue Ribbon Panel, Cancer Research Data Ecosystem and the role of imaging in precision medicine
NCI Cancer Genomics, Open Science and PMI: FAIR Warren Kibbe
Talk given to the NLM Fellows on July 8, 2016. Touches on Cancer Genomics, Open Science and PMI: FAIR in NCI genomics thinking and projects. Includes discussion of the Genomic Data Commons (GDC), Cancer Data Ecosystem, Data sharing, and the NCI cancer clinical trials open API.
Can SAR Database: An Overview on System, Role and Applicationinventionjournals
: The intention of this paper is to provide an technical overview on the largest cancer database, the canSAR database system. This overview includes the basic definitions and terminology, findings and advancements infield of cancer research through canSAR database, with basic system architecture, design, data source, processing pipelines, screening tests and structure activity relationship of system.
Keynote at NVIDIA GPU Technology Conference in D.C.Jerry Lee
Presentation at NVIDIA GPU Technology Conference in D.C. on how the Cancer Moonshot Task Force under Vice President Biden is using AI to help end cancer as we know it. Dr. Lee will discuss global efforts to empower A.I. and deep learning for oncology with larger and more accessible datasets.
Day 2 Big Data panel at the NIH BD2K All Hands 2016 meetingWarren Kibbe
Big data in oncology and implications for open data, open science, rapid innovation, data reuse, reproducibility and data sharing. Cancer Moonshot, Precisions Medicine Initiative (PMI), the Genomic Data Commons, NCI Cloud Pilots, NCI-DOE Pilots, and the Cancer Research Data Ecosystem.
SuperComputing 16 HPC Matters Panel on Precision MedicineWarren Kibbe
The National Cancer Institute aims to lessen the burden of cancer through scientific evidence. In 2016, there will be an estimated 1.7 million new cancer cases and 600,000 cancer deaths in the United States. Precision medicine will lead to a fundamental understanding of the complex interplay between genetics, epigenetics, nutrition, environment and clinical presentation to direct effective cancer prevention and treatment. The Cancer Moonshot initiative utilizes precision medicine, national computing initiatives, and making genomic data widely available to investigate and explore cancer prediction using real-world data.
- The Personalized OncoGenomics (POG) program at the British Columbia Cancer Agency conducted whole-genome analysis on tumors from 100 patients with advanced or incurable cancers to inform treatment decisions.
- Fresh tumor and blood samples were obtained from patients and underwent whole-genome and RNA sequencing. Computational analysis identified potential driver mutations, genes and pathways.
- A multidisciplinary team discussed genomic findings weekly and established guidelines for interpreting and communicating results to integrate them into patient care. Genomic findings were considered actionable in 55 of 78 cases that underwent whole-genome analysis, and motivated treatment changes in 23 cases.
- The experience demonstrated that a multidisciplinary team can implement an approach where whole-genome
FDA NGS and Big Data Conference September 2014Warren Kibbe
The document discusses the National Cancer Institute's efforts to address challenges in cancer data access and analysis through the development of the NCI Genomics Data Commons and NCI Cloud Pilots. The NCI Genomics Data Commons will provide integrated genomic and clinical cancer data from projects like TCGA to researchers. The NCI Cloud Pilots aim to explore cloud-based models for analyzing large cancer genomics datasets without having to download the full datasets locally, helping to enable more widespread data access and analysis. The goal is to build a national learning health system for cancer clinical genomics through open data sharing and cloud-based approaches.
- The document discusses the Total Cancer Care (TCC) approach at Moffitt Cancer Center, which aims to provide personalized cancer care through comprehensive data collection and analysis.
- TCC collects extensive clinical, genomic, treatment and outcomes data from over 78,000 consented patients to power research studies and clinical trials matching. Molecular profiling has been conducted on over 14,000 tumor samples.
- The TCC data is housed in a large integrated database and used by researchers for studies in areas like radiochemotherapy response, exome sequencing, immunology biomarkers, and cancer epidemiology.
- The database also helps clinicians identify eligible patients for clinical trials and develop evidence-based treatment pathways. The goal is to transform cancer
This document discusses Moffitt Cancer Center's Total Cancer Care program which aims to transform cancer care through a personalized approach. It involves collecting extensive clinical, molecular, and biospecimen data from patients over their lifetime to power research. The goals are to improve outcomes through early detection, personalized treatment, and clinical trials matching. Moffitt has established an extensive biorepository and informatics platform to integrate data from over 78,000 consented patients to enable precision oncology research.
Federal Research & Development for the Florida system Sept 2014 Warren Kibbe
This document discusses challenges in cancer data integration and analysis. It proposes the development of open science models, standardized data elements, and sustainable informatics infrastructure. Emerging technologies like mobile devices, social media, and cloud computing create opportunities to build a national "learning health system" for cancer. The National Cancer Institute is pursuing initiatives like the Cancer Genomics Data Commons and cloud pilots to leverage large genomic and clinical datasets using these technologies and develop predictive models to improve outcomes. The ultimate goal is a system that facilitates data sharing, continuous learning from all cancer patients, and personalized, predictive oncology.
Cancer Research Data Ecosystem - Dr. Warren Kibbeimgcommcall
The document discusses the Cancer Research Data Ecosystem and the National Cancer Data Ecosystem being developed through the Beau Biden Cancer Moonshot initiative. It notes that cancer research and care generate large amounts of detailed data that is critical to create a learning health system for cancer. It highlights efforts like the NIH Genomic Data Commons and the need for data standards to make cancer-related data more accessible, interoperable, and reusable to researchers. The goal is to maximize data sharing and reuse to advance the understanding of cancer and improve prevention and treatment outcomes.
US Federal Cancer Moonshot- One Year LaterJerry Lee
Presentation from former Cancer Moonshot Data and Technology Track Co-chairs Jerry S.H. Lee, PhD (NCI, former OVP) and Dimitri Kusnezov, PhD (DOE) to update on efforts that will help realize the Data/Tech Track's vision of a national learning healthcare system for cancer. These include NCI/DOE pilots, DOE/VA pilot, NCI GDC, DoD/VA/NCI APOLLO, NCI/GSK ATOM, and BloodPAC.
2016 Data Commons and Data Science Workshop June 7th and June 8th 2016. Genomic Data Commons, FAIR, NCI and making data more findable, publicly accessible, interoperable (machine readable), reusable and support recognition and attribution
Presentation "The Impact of All Data on Healthcare"
Keith Perry
Associate VP & Deputy CIO
UT MD Anderson Cancer Center
With continuing advancement in both technology and medicine, the drive is on to make all data meaningful to drive medical discovery and create actionable outcomes. With tools and capabilities to capture more data than ever before, the challenge becomes linking existing structured and unstructured clinical data with genomic data to increase the industry’s analytical footprint.
Learning Objectives:
∙ Discuss the need to make all data meaningful in order to speed discovery of new knowledge
∙ Provide examples of an analytical direction that supports evolution in medicine
∙ Expose the challenges facing the industry with respect to ~omits
Data sharing drivers in precision oncology, biomedical research, and healthcare. Accelerating discovery, innovation, providing credit for all stakeholders - patients, researchers, care providers, payers.
CI4CC Moonshot Blue Ribbon Panel Report 20161010Warren Kibbe
Presentation to the Fall CI4CC meeting in Utah. CI4CC Moonshot Blue Ribbon Panel Report. Highlights of Vice President Biden's Cancer Moonshot and the NCI Blue Ribbon Panel Recommendations.
Converged IT Summit - NCI Data SharingWarren Kibbe
Cancer Moonshot, Data Sharing, Genomic Data Commons, NCI Cloud Pilots, Cancer Research Data Ecosystem, technology advances, chemotherapy advances, MATCH, NCI Cancer Moonshot Blue Ribbon Panel Recommendations
Advancing The Prevention And Cure Of Cancerfondas vakalis
The document discusses the shared missions and collaborations between the American Association for Cancer Research (AACR) and the National Cancer Institute (NCI) to advance cancer research and reduce the burden of cancer. It outlines their joint efforts in conferences, workshops, and think tanks. It also summarizes advances in cancer prevention, early detection, and treatment that have contributed to reduced cancer mortality rates in recent years but challenges remain.
Cancer Moonshot, Data sharing and the Genomic Data CommonsWarren Kibbe
Gave the inaugural Informatics Grand Rounds at City of Hope on September 8th. NIH Commons, Genomic Data Commons, NCI Cloud Pilots, Cancer Moonshot and rationale for changing incentives around data sharing all discussed.
The document discusses standardizing clinical case report forms (CRFs) through the creation of a library of standardized CRF modules. It describes the process undertaken by a working group to analyze, harmonize and approve an initial CRF module on demography. The working group analyzed existing CRFs, agreed on core data elements, and obtained community input to create a standardized demography module. Subsequent modules on other topics will follow the same process.
This document provides an overview of the November 2000 issue of JALA (Journal of Analytical Laboratories Automation). It describes the development of a novel robotic system for the New York Cancer Project biorepository in collaboration with the Medical Automation Research Center. The biorepository receives 50-100 blood samples per day which are processed robotically to extract, quantify, aliquot and store DNA, plasma and RNA to be accessible to investigators. The robotic system aims to provide rapid random access to the hundreds of thousands of DNA samples stored for high-throughput analysis in studies of gene-environment interactions and cancer risk.
Pistoia Alliance US Conference 2015 - 1.5.4 New data - Nikolaus SchultzPistoia Alliance
The document provides an overview of cancer genomics data and tools for analyzing it. It discusses how next-generation sequencing is identifying genetic alterations in cancer at an increasing scale through projects like TCGA. The cBioPortal is highlighted as a tool that provides intuitive access to these complex cancer genomics datasets and helps identify patterns across data to provide clinical insights. It has become widely used and its code is now fully open source to further collaborative cancer research.
Kim Solez Tissue Engineering Pathology Meets Human Cell Atlas a Glimpse into ...Kim Solez ,
Dr. Kim Solez presents "Tissue Engineering Pathology Meets Aviv Regev's Human Cell Atlas: A Glimpse Into the Future of Pathology" on March 8th, 2017 at the University of Alberta in Edmonton, Alberta, Canada Copyright (c) 2017, JustMachines Inc.
Univ of Miami CTSI: Citizen science seminar; Oct 2014Richard Bookman
The University of Miami's Clinical & Translational Science Institute runs a seminar course for MS students.
This talk surveys 8 citizen science projects, reviews NIH's current activities, and identifies issues for attention, particularly with ethical, legal and social implications.
- National challenges in cancer research include lowering barriers to data access and analysis, and integrating clinical and basic research data to enable improved outcomes.
- Disruptive technologies like high-throughput biology and ubiquitous computing are generating large amounts of molecular and clinical cancer data.
- The NCI is working to build infrastructure like the Genomics Data Commons and Cloud Pilots to make these data widely accessible and support data analysis.
- The goal is to develop a national "learning health system" that applies insights from real-world cancer data to research and clinical practice to continuously improve patient care and outcomes.
This document provides information about the "BioData World West 2017" conference taking place April 26-27, 2017 in San Francisco. The conference will bring together over 200 participants from various backgrounds to discuss disruptive approaches in drug development, personalized medicine, and clinical applications using big data in precision medicine. Expert speakers will present on topics including genomics, precision medicine, and a new AI track in partnership with Merck. Registering online reserves a place at the conference and featured sessions will explore various applications and challenges of harnessing big data in healthcare and biomedicine.
Enabling Translational Medicine with e-ScienceOla Spjuth
This document discusses how e-science can enable translational medicine and support cancer research. It highlights several projects using e-science approaches:
1) The SAIL method integrates data across biobanks to enable cross-archive research.
2) eCPC uses imaging analysis, biomarkers, and microsimulation modeling on high-performance computing to develop more accurate risk prediction and screening strategies.
3) The ClinSeq project applies clinical sequencing, machine learning, and data integration to develop individualized cancer diagnostics and define clinically relevant biomarkers.
Similar to Advancing Convergence and Innovation in Cancer Research (19)
Analysis insight about a Flyball dog competition team's performanceroli9797
Insight of my analysis about a Flyball dog competition team's last year performance. Find more: https://github.com/rolandnagy-ds/flyball_race_analysis/tree/main
Learn SQL from basic queries to Advance queriesmanishkhaire30
Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively.
Key Highlights:
Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation.
Advanced Queries: Learn to craft complex queries to uncover deep insights from your data.
Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets.
Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios.
Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making.
Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data!
#DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics
End-to-end pipeline agility - Berlin Buzzwords 2024Lars Albertsson
We describe how we achieve high change agility in data engineering by eliminating the fear of breaking downstream data pipelines through end-to-end pipeline testing, and by using schema metaprogramming to safely eliminate boilerplate involved in changes that affect whole pipelines.
A quick poll on agility in changing pipelines from end to end indicated a huge span in capabilities. For the question "How long time does it take for all downstream pipelines to be adapted to an upstream change," the median response was 6 months, but some respondents could do it in less than a day. When quantitative data engineering differences between the best and worst are measured, the span is often 100x-1000x, sometimes even more.
A long time ago, we suffered at Spotify from fear of changing pipelines due to not knowing what the impact might be downstream. We made plans for a technical solution to test pipelines end-to-end to mitigate that fear, but the effort failed for cultural reasons. We eventually solved this challenge, but in a different context. In this presentation we will describe how we test full pipelines effectively by manipulating workflow orchestration, which enables us to make changes in pipelines without fear of breaking downstream.
Making schema changes that affect many jobs also involves a lot of toil and boilerplate. Using schema-on-read mitigates some of it, but has drawbacks since it makes it more difficult to detect errors early. We will describe how we have rejected this tradeoff by applying schema metaprogramming, eliminating boilerplate but keeping the protection of static typing, thereby further improving agility to quickly modify data pipelines without fear.
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...Social Samosa
The Modern Marketing Reckoner (MMR) is a comprehensive resource packed with POVs from 60+ industry leaders on how AI is transforming the 4 key pillars of marketing – product, place, price and promotions.
State of Artificial intelligence Report 2023kuntobimo2016
Artificial intelligence (AI) is a multidisciplinary field of science and engineering whose goal is to create intelligent machines.
We believe that AI will be a force multiplier on technological progress in our increasingly digital, data-driven world. This is because everything around us today, ranging from culture to consumer products, is a product of intelligence.
The State of AI Report is now in its sixth year. Consider this report as a compilation of the most interesting things we’ve seen with a goal of triggering an informed conversation about the state of AI and its implication for the future.
We consider the following key dimensions in our report:
Research: Technology breakthroughs and their capabilities.
Industry: Areas of commercial application for AI and its business impact.
Politics: Regulation of AI, its economic implications and the evolving geopolitics of AI.
Safety: Identifying and mitigating catastrophic risks that highly-capable future AI systems could pose to us.
Predictions: What we believe will happen in the next 12 months and a 2022 performance review to keep us honest.
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
Advancing Convergence and Innovation in Cancer Research
1. Prostate Cancer Foundation Scientific Retreat: Young Investigator Day
Washington D.C.
October 4th, 2017
Jerry S.H. Lee, Ph.D.
Health Sciences Director
Deputy Director, Center for Strategic Scientific Initiatives (CSSI)
Joint Executive for Data Integration, Center for Biomedical Informatics and Information Technology (CBIIT)
Office of the Director, National Cancer Institute (NCI), National Institutes of Health (NIH)
Advancing Innovation and Convergence in Cancer Research:
National Cancer Institute’s Center for Strategic Scientific Initiatives
2. WHO i am & WHAT is cssi?
2016 at a glance
2017continuing the momentum
3. “…it is of critical national importance
that we …double the rate of progress
in the fight against cancer- and put
ourselves on a path to achieve in just
5 years research and treatment gains
that otherwise might take a decade or
more…”
4. 2003
Source NCI Factbooks (http://obf.cancer.gov/financial/factbook.htm)
201320102007
NCI Director
NCI Principal Deputy Director
10/18/06 07/12/10 04/01/15
Jerry joins 07/06
Von Eschenbach Neiderhuber Varmus Lowy
LowyBarkerBarker
5. 06
2005 2018
Joined NCI
Center for Strategic
Scientific Initiatives
(CSSI)
08
Official
“Other Duties
As Assigned”
09
Transitioned to
Deputy Director, CSSI
10 16
Served as Deputy Director for
Cancer Research and Technology
WH Cancer Moonshot Task Force
4/14/16
10/17/16
PhD in Chemical and Biomolecular Engineering
Nuclear and Cellular Mechanics: Implications for Laminopathies and Cancer
6. 2004
New Cancer Test Stirs Hope and Concern
Lancet 2002; 359: 572-577
2002
Nature 2004; 429: 496-497
2004
7. “The working group recommends
the initiation of a bold technology-
based project: Human Cancer
Genome Project.”
- National Cancer Advisory Board (NCAB) Working Group on
Biomedical Technology, February 16, 2005
https://deainfo.nci.nih.gov/advisory/ncab/workgroup/archive/sub-bt/NCABReport_Feb05.pdf
8. “…the unstated goal of the HCGP is to accelerate the discovery
of cures for cancers. The question we need to answer is not
whether the information generated will be useful, but whether, if
given $1.5 billion in “new” cancer money, would the HCGP
be the best application of that money toward the goal of
cancer cures…”
– Oct 21, 2005
9. “…investigator-initiated grants have
become impossible to get…young
researchers don’t have much of a
future…those that determine
funding…have lost sight of the most
important element…it is not large
research consortia, not new
technologies, not cancer centers, but
the young individual investigator…”
“…a human Cancer Genome
Atlas…would systematically sequence
tumor samples for mutations involved
in cancer to speed up the search for
new drugs and diagnostics…its project
price tag of $1.5 billion over a decade
was whittled down to a 3-year, $100
million pilot…”
12. NCI Center for Strategic Scientific Initiatives
(CSSI): Concept Shop
Dates indicate approval(s) by NCI Board of Scientific Advisors; *Program moved to NCI Division of Cancer Biology
“…to create and uniquely implement exploratory programs focused on the development and integration of advanced technologies, trans-
disciplinary approaches, infrastructures, and standards, to accelerate the creation and broad deployment of data,
knowledge, and tools to empower the entire cancer research continuum in better understanding and leveraging knowledge of the
cancer biology space for patient benefit…”
Mission
2003, 2007, 2011, 2013, 2014
2004, 2008, 2014
2005, 2010, 2015
2005, 2008 2010
2008, 2013* 2011, 2014
Deputy Director
Jerry S.H. Lee, PhD
Director
Douglas R. Lowy, MD
13. Translational from basic
science to human studies
Translational of new interventions into
the clinic and health decision making
Defining mechanisms,
targets, and lead molecules
New methods of diagnosis,
treatment, and prevention
Delivery of recommended and
timely care to the right patient
True Benefit to society
Controlled studies
leading to effective care
14. Standards and protocols
Real-time, public release of data
Large, multi-disciplinary teams
Pilot-friendly team environment to share
failures and successes
Team members with
trans-disciplinary training
Translation Pace: How To Break Out of Current
Paradigm?
Key Needs (from community ‘02)
Turning the Crank… The potential to transform cancer drug
discovery and diagnostics
Paul et. al, Nature Rev. Drug Discovery, March 2010
$150M
Phase I: $273M
Phase II: $319M
Phase III: $314M
$48M
$414M$166M
$94M
~$1.8B/turn
15. “What is Water?”: Measurements Insights
Color (clear, yellow, brown)
Taste (none, metallic, awful)
LOTS of
Quantitative
“Data”
Qualitative Descriptions
Phase (liquid, gas, solid)
Phase change (boil, melt, freeze)
Measurements
Taken
But also LOTS of
disagreements…
Boiling point = 92oC Boiling point = 100oC
16. “What is Water?”: Standards and Sharing of Data
New Insights and Understanding
2400m
0m
New Parameter
“Pressure”
LOTS of
Quantitative
and
Reproducible
Data
(Steam Table)
New Understanding
• Phase boundaries
• V/L equilibrium
• Triple Point
(Phase Diagram)
• Define samples and protocols
• Share collected data
Boiling point = 92oC
Boiling point = 100oC
17.
18. (12,000+ patient tumors and increasing)
2006-2015: A Decade of Illuminating the Underlying
Causes of Primary Untreated Tumors
Primary
tumor
(Localized)
20. “…to conduct this mini–cancer-genome project, a 29-person team, resequenced…11
breast cancer samples and 11 colon cancer samples…then winnowed out more than
99% of the mutations by removing errors…and changes that didn’t alter a protein.
…this yielded a total of 189 “candidate” cancer genes. Although some are familiar…most
had never been found mutated in cancer before. The results…are a ‘treasure trove’…
…the relatively small number of new genes common to the tumors reinforces concerns
about [NIH] The Cancer Genome Atlas…
…despite such doubts, the atlas project gets under way next week. NIH will announce
the three cancers to be studied in the pilot phase…the project is on an extremely
aggressive timeline…”
21. glioblastoma multiforme
(brain)
squamous carcinoma
(lung)
serous cystadenocarcinoma
(ovarian)
• Clinical diagnosis
• Treatment history
• Histologic diagnosis
• Pathologic status
• Tissue anatomic site
• Surgical history
• Gene expression
• Chromosomal copy number
• Loss of heterozygosity
• Methylation patterns
• miRNA expression
• DNA sequence
Biospecimen Core
Resource with more than
13 Tissue Source Sites
7 Cancer Genomic
Characterization Centers
3 Genome
Sequencing
Centers
Data Coordinating Center
Three Cancers- Pilot Multiple data types
25. Academic Industry
Courtesy of Peter Stojanov, Dana Farber, TCGA 2012 Courtesy of Nickolay Khazanov, Compendia Bioscience, TCGA 2012
Difference Perspectives Using TCGA Data (2012)
29. Re-writing Central Dogma (2016)
On average across 375
tumor samples, ONLY 33%
of DNA/RNA predicted
cancer protein abundance
Zhang, B. et. al. Proteogenomic characterization of human colon and rectal cancer. Nature. 2014 Jul 20
30. http://cancerimagingarchive.net
• 33,000 total subjects
in the archive
• 67 data sets currently
available
• 21 from The Cancer
Genome Atlas project
• 10 from the Quantitative
Imaging Network
• Clinical trial data from
ECOG-ACRIN and RTOG
35. Overarching Structure of CPTAC 3.0
(2016 – 2021)
A. Proteome Characterization Centers
additional cancer types where questions
remain on their proteogenomic complexity
B. Proteogenomic Translational Research Centers
research models and NCI-sponsored clinical trial
C. Proteogenomic Data Analysis Centers
develop innovative tools that process and integrate
data across the entire proteome
Data, assays and resources - community resources
newtreatment-naïve
cancertypes
5-6
Henry Rodriguez
henry.rodriguez@nih.gov
36. Proteogenomic Translational Research Centers
Structure and Information
Applications must cover BOTH preclinical studies and studies
with clinical biospecimens from NCI-sponsored trials
Preclinical Research Arm
• Comprehensively characterize and quantitatively measure
proteins and their variants along with associated genomics
in preclinical cancer model samples
Clinical Research Arm
• Develop and apply quantitative proteomic assays to cancer-
relevant proteins identified in Preclinical Research Arm or
preliminary data, to NCI-sponsored clinical trial samples
(http://proteomics.cancer.gov/aboutoccpr/fundingopportunities/curr
ent/Reissuance-of-Clinical-Proteomic-Tumor-Analysis-Consortium)
37.
38.
39. http://www.cancer.gov/moonshot
October 17, 2016
“…established, within the Office of the Vice President, a
White House Cancer Moonshot Task Force, which will
focus on making the most of Federal investments,
targeted incentives, private sector efforts from industry
and philanthropy, patient engagement initiatives, and
other mechanisms to support cancer research and enable
progress in treatment and care…”
“…a Blue Ribbon Panel… will provide expert advice on the
vision, proposed scientific goals, and implementation of the
National Cancer Moonshot….the Panel will provide an intensive
examination of the opportunities and impediments in cancer
research…initial findings and recommendations of the Panel will
be reported to the National Cancer Advisory Board that will
provide final recommendations to the NCI Director…”
40. Cancer Moonshot
Federal Task Force
Vice President’s Office
“Blue Ribbon Panel”
Working Groups
NCAB
NCI
Courtesy of Dinah Singer (http://deainfo.nci.nih.gov/advisory/bsa/0316/0905Singer.pdf)
41. Catalyze New Scientific Breakthroughs
Unleash the Power of Data
Accelerate Bringing New Therapies to Patients
Strengthen Prevention and Diagnosis
Improve Patient Access and Care
STRATEGIC GOALS IMPLEMENTATION PATH
FEDERAL
PRIVATE/
NON-PROFIT
PUBLIC-PRIVATE
COLLABORATION
2/1/2016 10/17/2016
42. Cancer Moonshot Data & Technology Team
Co-Chairs: Dimitri Kusnezov (DOE), DJ Patil (OSTP), and Jerry Lee (OVP)
Members:
• John Scott (DoD)
• Craig Shriver (DoD)
• Cheryll Thomas (CDC)
• Frances Babcock (CDC)
• Teeb Al-Samarrai (DOE)
• Sean Khozin (FDA)
• Alexandra Pelletier (PIF)
• Maya Mechenbier (OMB)
• Henry Rodriguez (NCI)
• Karen Cone (NSF)
• Michael Kelley (VA)
• Louis Fiore (VA)
• Warren Kibbe (NCI)
• Betsy Hsu (NCI)
• Niall Brennan (CMS)
• Thomas Beach (USPTO)
• Claudia Williams (OSTP)
• Vikrum Aiyer (USPTO)
• Tom Kalil (OSTP)
• Kathy Hudson (NIH)
• Dina Paltoo (NIH)
• Al Bonnema (DoD)
• Michael Balint (PIF)
• Kara DeFrias (OVP)
• Greg Pappas (FDA)
• Erin Szulman (OSTP)
• Paula Jacobs (NCI)
43.
44. Cancer
CenterPatient
Unable to
Share Primary
Care DataPrimary
Care
Cancer Diagnosis
and Treatment
Cancer
Survivor
Primary
Care
Unable to
Share Cancer
Care Data
Cancer
Relapses
(Months-
Years)
(Months-
Years)
Assumes returning to the same cancer care facility
Without a National Learning
Healthcare System for Cancer
Lost Opportunity to
Learn from Pre-Cancer
Clinical Data
Lost Opportunity to
Learn from Post-Cancer
Treatment Clinical Data
45. Vision:
Enable the creation of a Learning Healthcare System
for Cancer, where as a nation we learn from the
contributed knowledge and experience of every
cancer patient. As part of the Cancer Moonshot, we
want to unleash the power of data to enhance, improve,
and inform the journey of every cancer patient from the
point of diagnosis through survivorship.
46. Priorities Areas and Ongoing Activities
Priority Area A: Enabling a seamless data environment [If you build it…]
MVP CHAMPION and NCI GDC
Priority Area B: Unlocking science through open [Make it easy AND
computational and storage platforms relevant to use…]
APOLLO
Priority Area C: Workforce development using open [They will come…]
and connected data
NCI-VA BD-STEP
47.
48. Million Veteran Program Computational Health Analytics for
Medical Precision to Improve Outcomes Now
(MVP CHAMPION)
Department of Veterans Affairs (VA) and the Department of Energy (DOE) are announcing a new five-year
collaboration to apply the most powerful computational assets at the DOE’s National Labs to nearly half a million
veterans' records from one of the world's largest research cohorts -- the Million Veteran Program
49. REGION 1 REGION 2 REGION 4
REGION 3
VA Medical Centers Regional / Corporate Data
Warehousing and Analytical Environment
RDW
V20
V19
V18
V22
V21
MOSS
Farm
RDW
V12
V15
V16
V17
V23
MOSS
Farm RDW
V1
V2
V3
V4
V5
MOSS
Farm
MOSS Farm
•Performance Point Services
•Excel Services
•Reporting Services
•Analysis Services
•Collaboration Services
•Team Foundation Services
RDW
V6
V7
V8
V9V10
V11
MOSS
Farm
CDW
SAS
Grid
VINCI
Apps
PMAS
GIS
MOSS
Farm
Enterprise
Courtesy of Ross Fletcher (DC VAMC)
50. NCI Genomic Data Commons
launched at ASCO on June 6, 2016
https://gdc-portal.nci.nih.gov
2.6 PB of legacy data and 1.5 PB of harmonized data.
51. GDC Content
GDC
TCGA 11,353 cases
TARGET 3,178 cases
Current
Foundation Medicine 18,000 cases
Cancer studies in dbGAP ~4,000 cases
Coming soon
NCI-MATCH ~3,000 cases
Clinical Trial Sequencing Program ~3,000 cases
Planned (1-3 years)
Cancer Driver Discovery Program ~5,000 cases
Human Cancer Model Initiative ~1,000 cases
APOLLO – VA-DoD ~8,000 cases
~56,000 cases
54. MCC Military Clinical Trials Network
Naval Medical Center
Portsmouth, VA
Clinical Trials
Increased Access
Referral Center
High cost/low volume
Genetics Counseling
Telehealth technologies
Training &Education
Distributed learning/fellowships
Standardized Clinical
Practice Guidelines
Evidenced-based clinical
practice & research
Patient Outreach
Education and information
MCC Membership
Murtha Cancer
Center
Naval Medical Center
San Diego, CA
Womack Army
Medical Center
Ft Bragg, NC
Keesler Air Force
Medical Center
Biloxi, MS
Lackland Air Force
Medical Center
San Antonio, TX
MCC Clinical Trials Network
Medical Treatment Facilities
MHS
Courtesy of Craig Shriver (DoD)
55. 20262016
How Could This Help the Patients? (2026)
VA
DoD
Proteogenomics
Characterization Centers
(PCC)
Proteogenomics Translational
Research Centers
(PTRC)
58. Patients with
new or recurrent
cancer diagnosis
Veterans
Active Duty &
DoD Beneficiaries
Civilians
Consents to
VA/DoD/NCI
APOLLO
research
program
The American
Genome Center
Co-enroll
MVP
Proteogenomics
Characterization
(~8,000 patients)
CPTAC PCC
+ MCC PRO / IHC
Residual tissue for CLIA-approved
targeted sequencing (CATS)
VA ORD
and
NCI-
sponsored
Clinical
Trials
NCI CTEP/CPTAC PTRC
VA Hospitals
Murtha Cancer
Center
Clinical Phenotype
& outcomes
Data aggregation, analysis, and sharing to
rapidly improve outcomes for active duty,
beneficiaries, veterans, and civilians
Murtha Cancer
Center
VA Hospitals
Adaptive Learning
Healthcare System
Clinical Data
Research Data
APOLLO – Applied Proteogenomics OrganizationaL Learning and Outcomes consortium
DaVINCI
Registry
DPALS CATS
65. 7/17/2016
“…proteogenomics, which is -- as I used a metaphor
-- it’s like the genes are the full roster of a basketball
team….but the winning strategy comes from finding
out who their starting lineup is. The proteins are the
starters you're going to play against -- the five you
are going to have to defend against
I’m pleased to say, Mr. Prime Minister, that we've
signed three memorandums of understanding
between our two nations …we're going to be able to
share patient histories, proteogenomics and clinical
phenotypes data -- data on various proteins and
genetic characteristics of almost 60,000 patients in
Australia and the United States with full privacy
protections…
And I predict that you're going to see this repeated
around the world.”
- Vice President Biden, Australia
https://www.whitehouse.gov/the-press-office/2016/07/16/fact-
sheet-victoria-comprehensive-cancer-center-vice-president-biden
74. BCRF has awarded a team science grant to Drs. Shriver
and Kuhn from the Department of Defense’s Murtha
Cancer Center and the University of Southern
California, while PCF is supporting Dr. Howard I. Scher
of Memorial Sloan Kettering Cancer Center (MSKCC)
and the Prostate Cancer Clinical Trials Consortium
(PCCTC).
The funds have been awarded to recognized leaders in
biomarker assay validation and are intended to
support pilot projects that will utilize multiple
technologies for analyzing rare events in the blood of
cancer patients and subsequently deposit the data
and associated protocols into the Blood PAC
commons.
85. Oncology Care
Model
Centers for Medicare &
Medicaid Services Innovation
Center (CMMI)
The Innovation Center is pursuing the opportunity
to further its goals of better care, smarter
spending, healthier people through an oncology
payment model.
• Episode-based
• Emphasizes practice transformation
• Multi-payer model
Nearly ~3,200 physicians from 190 practices
spanning 16 commercial insurers are
participating in OCM
~150,000 unique beneficiaries/year
~200,000 episodes/year (~$6 billion/year)
~20% of CMS FFS chemo patients are in OCM
http://innovation.cms.gov/initiatives/Oncology-Care/
OCMSupport@cms.hhs.gov
Timeline: July 1, 2016-June 30, 2021
86. 1) Provide Enhanced Services
• Provide OCM Beneficiaries with 24/7 access
to an appropriate clinician who has real-
time access to the Practice’s medical
records
• Provide the core functions of patient
navigation to OCM Beneficiaries
• Document a care plan for each OCM
Beneficiary that contains the 13
components in the Institute of Medicine
Care Management Plan
• Treat OCM Beneficiaries with therapies that
are consistent with nationally recognized
clinical guidelines
2) Use certified electronic health record
technology (CEHRT)
3) Utilize data for continuous quality
improvement
Novel Therapies Adjustment
• Potential adjustment based on the proportion of
each practice’s average episode expenditures for
novel therapies
– Includes oncology drugs that received FDA
approval after December 31, 2014
– Use of the novel therapy must be consistent with
the FDA-approved indications for inclusion in the
adjustment
– Oncology drugs are considered “new” for 2 years
from FDA approval for that specific indication
Ron Kline, MD
90. Big Data Scientist Training Enhancement Program
(BD-STEP)
Graduates of BD-STEP would:
• have skillsets to perform next-generation patient-
centered outcomes research by manipulating and
analyzing large-scale, multi-element, patient data sets
to develop novel disease signatures or unique
performance-based clinical benchmarks
• have an understanding of real-time, performance-
driven health care delivery in the VA systems
Michelle Berny-Lang, NCIConnie Lee, VHA/EES
2017 Potential
Partners:
94. 94
National Cancer Data Ecosystem
Genomic
Data Commons
Data Standards
Validation and Harmonization
Imaging
Data Commons
Proteomics
Data Commons
Clinical Data
Commons
(Cohorts / Indiv.)
SEER
(Populations)
Data Contributors and Consumers
Researchers PatientsCliniciansInstitutions
Blood Profiling Atlas
Commons
95.
96.
97.
98. NCI CSSI Science Day 2015
5/18/2015
“…Cancer research initiatives for trans-NCI benefit (started
in 2003; total awards ~100million/year; 25% first time NIH
grantees…”- Doug Lowy, 2015
Breakdown of Contact PI Status of New Awards Solicited by CSSI RFAs (FY05-FY14)
101. Learn More About Us…
http://cssi.cancer.gov
Jerry S.H. Lee, PhD
jerry.lee@nih.gov
@NCI_CSSI
@jleePSOC
102. Development of a Natural Language Processing
(NLP) Workbench Web Service
• Two Year Project (July 2016 – September 2018)
• Project Goals:
– Develop a Natural Language Processing (NLP) Workbench that utilizes
Web Services for analyzing unstructured clinical information
– Pilots for use in cancer registries and safety surveillance domains
– Code cancer data items to nationally adopted coding systems (ICD-O-3)
– Collect data from at least four national laboratories for the following
primary cancer sites (Breast, Lung, Prostate, Colorectal)
• 125 cases per cancer site from each laboratory for a total of at least 2,000 cases.
– Double annotation will be completed by certified tumor registrars with
a master reviewer
NLPWorkbench@cdc.gov
103. Sandy Jones (CDC)
NLP Workbench Web Service
Dos and Don’ts
• Will include processes with demonstrated
efficiency - is more than a collection of
general NLP components and workflows
• Will cover certain needs - cannot be the
panacea for all problems
• Will describe the process for the generation
of annotated datasets
• Intend to incorporate only open-source
solutions equipped to support the project
objectives and will not endorse ANY existing
solution