The National COVID Cohort Collaborative (N3C):
Let’s Get Involved !
Warren A. Kibbe, PhD, FACMI
June 15, 2021
Purdue Big Data in Cancer Workshop
@data2health
@ncats_nih_gov
covid.cd2h.org
ncats.nih.gov/n3c
@wakibbe
Speaker Objectives
Warren Kibbe
Duke Biostatistics & Bioinformatics
CTSA Informatics
Duke Cancer Institute
Member N3C
● Real World Data
● Open Science
● Overview of N3C
● N3C Data Enclave statistics
● How common data models and variables
are harmonized
● The scope of answerable questions
● Data access and security
● How common data models and variables
are harmonized
● Oncology research in N3C
A program of NIH’s National Center
for Advancing Translational Sciences
Special thanks to:
● Chris Chute, N3C, Johns Hopkins
● Melissa Haendel, N3C, Colorado University
● Umit Topaloglu, N3C, Wake Forest
● Frank Rockhold, Duke
● Noha Sharafeldin, N3C, UAB
4
Take homes
• N3C represents a unique resource to examine effects of COVID-19 on cancer
outcomes
• Largest COVID-19 and cancer cohort within the US
• Consistent with previous literature, older age, male gender, increasing comorbidities,
and hematological malignancies were associated with higher mortality in patients with
cancer and COVID-19
• The N3C dataset confirmed that cancer patients with COVID-19 who received recent
immuno-, or targeted therapies were not at higher risks of overall mortality
What is Real World Data?
Collected in the
context of patient
care. Real World
Data was called out
as part of the 21st
Century Cures Act
21st Century Cures Act: https://www.fda.gov/regulatory-information/selected-amendments-fdc-act/21st-century-cures-act
Graphic from HealthCatalyst: https://www.healthcatalyst.com/insights/real-world-data-chief-driver-drug-development
Our ability to generate biomedical
data continues to grow in terms of
variety and volume
Current sources of data
molecular genome pathology imaging labs notes sensors
icons by the Noun Project
AI is changing our ability to go both
deep and broad
Trustworthy AI
Provenance
Reusable
Reproducible
Having a health equity lens
● Digital Health, precision medicine, and real world data
all have the power to transform healthcare. However,
we must pay attention to structural racism and implicit
bias if we want to achieve equity.
21st Century Cures Act
Last year I discussed the NCI Cancer
Moonshot and Precision Medicine
activities funded under the 21st Century
Cures Act
FDA was directed by congress to focus
on the use of RWD and RWE in drug
design, development and outcomes
assessment
https://www.fda.gov/regulatory-information/selected-
amendments-fdc-act/21st-century-cures-act
Is it just about Real World Data?
What about Open Science? Data transparency? Data Access?
The importance of Open Science
Calls for greater transparency and ‘open data access’ in clinical research
continue actively.
● “Open science is the movement to make scientific research, data and
dissemination accessible to all levels of an inquiring society”*
● Open Science Project**: “If we want open science to flourish, we should
raise our expectations to: Work. Finish. Publish. Release.”
● FAIR Principles: Findability, Accessibility, Interoperability, and Reusability***
● TRUST Principles: Transparency, Responsibility, User focus, Sustainability
and Technology
* https://www.fosteropenscience.eu/resources
** http://openscience.org/
*** https://www.nature.com/articles/sdata201618
****https://www.nature.com/articles/s41597-020-0486-7
Open Science and Patient Data Access
Some of the challenges are:
● Patient privacy
● Academic credit
● Commercial sensitivity and intellectual property
● Data standards
● Resources (money and people)
There should be room for researchers and patients alike to gain from this effort.
Informatics experts and data scientists are essential elements of this discussion.
One problem with Clinical Trials Data Sharing
● “The tendency for researchers to ‘‘sit’’ on their data for an unduly long period
of time is neither desirable from a scientific point of view nor acceptable from
an ethical perspective. ‘
● ‘After all, the data belong to the patients who agreed to participate in the
research, not to the investigators who coordinated it, as the new European
General Data Protection Regulation emphasizes.”*
*Rockhold, F, et al. Open science: The open clinical trials data journey, Clinical Trials, Vol 16 (5) 1-8, 2019
Access to patient-level data is important for research
There are certainly challenges, but question is not whether data should be
shared, but rather how and when access should be granted.
Responsible open access enables secondary analyses that:
● Enhance reproducibility of clinical research
● Honor the contributions of trial participants,
● Improve the design of future trials
● Generate new research findings
This journey of making patient data available is part of an evolution in
transparency and not a sudden awakening.
What about N3C?
It is an open science, controlled access environment
Clinical and Translational Science
Awards (CTSA) Program
● Algorithms (diagnosis, triage, predictive, etc.)
● Drug discovery & pharmacogenetics
● Multimodal analytics (EHR, imaging, genomics)
● Interventions that reduce disease severity
● Best practices for resource allocation
● Coordinated research efforts to maximize efficiency and
reproducibility
These all require the creation
of a comprehensive clinical data set
The pandemic highlights urgent needs
A program of NIH’s National Center
for Advancing Translational Sciences
What Kinds of Questions Can N3C Address?
The scope and scale of the information in the platform
will support probing questions such as:
● What social determinants of health are risk factors for mortality?
● Do some therapies work better than others? By region? By demographics?
● Can we compare local rare clinical observations with national occurrences?
● Can we predict who might have severe outcomes if they have COVID-19?
● What factors will predict the effectiveness of vaccines?
● Can we predict acute kidney injury in COVID-19 patients?
● Who might need a ventilator because of lung failure?
A program of NIH’s National Center
for Advancing Translational Sciences
Cohort characterization objectives
To clinically characterize the N3C cohort
● Largest U.S. COVID-19 cohort to date (+ representative controls)
● Racially, ethnically, and geographically diverse
To develop and share validated, versioned OMOP representations of
common variables (labs, vital signs, medications, treatments)
To generate hypotheses to be tested within N3C and elsewhere
● Clinical phenotypes and trajectories
● Treatment patterns and response
● … and many others
?
+
A program of NIH’s National Center
for Advancing Translational Sciences
Benefits for Participation
●Access to large scale COVID-19 data from across the nation
●Pilot data for grant proposals
●Opportunities for KL2 and TL1 and other scholars
●Team science opportunities for new questions and access to
Teams, statistics, machine learning (ML), informatics
expertise
●Learn ML analytics, NLP methods & access to tools, software,
additional datasets
A program of NIH’s National Center
for Advancing Translational Sciences
Step 4. Federated Analytics with HPC
Who is in the N3C?
The N3C Computable Phenotype
● At a high level, our phenotype looks for patients:
○ With a positive COVID-19 test (PCR or antibody) OR
○ With an ICD-10-CM code of U07.1 OR
○ Two or more COVID-like diagnosis codes (ARDS, pneumonia, etc.) during the
same encounter, but only on or prior to 5/1/2020
● Each one of these patients is then demographically matched to two patients with
negative or equivocal COVID-19 tests.
● Each site securely sends this set of patients, along with their longitudinal EHR
data from 1/1/2018 to the present, to the N3C on a regular basis.
Age 47
Gender M
Race Black
Ethnicit
y
Unknow
n
COVID Positive
Matching algorithm
Age 49
Gender M
Race Black
Ethnicit
y
Hispanic/
Latino
COVID Negative
Age 46
Gender M
Race Black
Ethnicit
y
Not
Hispanic
COVID Negative
A program of NIH’s National Center
for Advancing Translational Sciences
N3C Timeline
A program of NIH’s National Center
for Advancing Translational Sciences
N3C Dashboard
A program of NIH’s National Center
for Advancing Translational Sciences covid.cd2h.org/dashboard
55 sites with data released (purple) and 37 sites with
data pending (open circle). OCHIN is a national network
of 131 sites (diamond).
covid.cd2h.org/teams
31 Domain teams!
As of June 14, 2021
https://ncats.nih.gov/n3c/resources/data-contribution/data-transfer-agreement-signatories
Data Transfer Agreement Signatories
6/14/2021
88 DTA Signatories
Northwestern University at Chicago ᛫ Tufts Medical Center ᛫ Advocate Health Care Network ᛫ University of Alabama at Birmingham ᛫ Oregon Health & Science University ᛫
University of Washington ᛫ Stanford University ᛫ The University of Michigan at Ann Arbor ᛫ Children's Hospital Colorado ᛫ Duke University ᛫ Medical College of Wisconsin ᛫ The
Ohio State University ᛫ University of Nebraska Medical Center ᛫ University of Arkansas for Medical Sciences ᛫ George Washington University ᛫ Johns Hopkins University ᛫ West
Virginia University ᛫ Medical University of South Carolina ᛫ University of North Carolina at Chapel Hill ᛫ University of Virginia ᛫ The University of Texas Medical Branch at Galveston
᛫ University of Minnesota ᛫ University of Cincinnati ᛫ Columbia University Irving Medical Center ᛫ Cincinnati Children's Hospital Medical Center ᛫ Rush University Medical Center ᛫
Nemours ᛫ University of Wisconsin-Madison ᛫ The State University of New York at Buffalo ᛫ Washington University in St. Louis ᛫ University of Rochester ᛫ The University of
Chicago ᛫ University of Miami ᛫ The Scripps Research Institute ᛫ University of Texas Health Science Center at San Antonio ᛫ University of Kentucky ᛫ University of Illinois at
Chicago ᛫ Virginia Commonwealth University ᛫ Weill Medical College of Cornell University ᛫ Carilion Clinic ᛫ University Medical Center New Orleans ᛫ The University of Iowa ᛫
Emory University ᛫ Maine Medical Center ᛫ The University of Texas Health Science Center at Houston ᛫ Boston University Medical Campus ᛫ The University of Utah ᛫ University of
Southern California ᛫ George Washington Children's Research Institute ᛫ University of Colorado Denver I Anschutz Medical Campus ᛫ Mayo Clinic Rochester ᛫ The Rockefeller
University ᛫ Montefiore Medical Center ᛫ University of Mississippi Medical Center ᛫ University of Oklahoma Health Sciences Center, Board of Regents ᛫ University of
Massachusetts Medical School Worcester ᛫ Aurora Health Care ᛫ Penn State ᛫ University of New Mexico Health Sciences Center ᛫ NorthShore University HealthSystem ᛫ Wake
Forest University Health Sciences ᛫ Vanderbilt University Medical Center ᛫ Regenstrief Institute ᛫ Brown University ᛫ Stony Brook University ᛫ University of California, Davis ᛫ Yale
New Haven Hospital ᛫ Rutgers, The State University of New Jersey ᛫ MedStar Health Research Institute ᛫ Loyola University Chicago ᛫ Loyola University Medical Center ᛫
University of Delaware ᛫ Children's Hospital of Philadelphia
N3C Enclave Data Stats
Pediatric cases
A program of NIH’s National Center
for Advancing Translational Sciences
N3C Enclave Data Stats
Pediatric cases
A program of NIH’s National Center
for Advancing Translational Sciences
N3C Enclave Data Stats
A program of NIH’s National Center
for Advancing Translational Sciences
Predicting Clinical Severity using machine
learning (64 input variables)
The most powerful predictors are patient age and widely available
vital sign and laboratory values.
The National COVID Cohort Collaborative: Clinical
Characterization and Early Severity Prediction
https://pubmed.ncbi.nlm.nih.gov/33469592/
Step 4. Federated Analytics with HPC
How does data get into N3C?
● We have gone through the high-level purpose – EHR data about COVID-19
patients
● Identified the contributing sites
● Know what the inclusion criteria for N3C is – documented COVID-19 testing
● Seen the dashboard overview of N3C and the overall cohort characteristics
● What are the data ingestion, harmonization, query, and publication processes?
● Data governance and security?
● And finally, what about cancer and COVID-19?
A program of NIH’s National Center
for Advancing Translational Sciences
Leveraging Common Data Models
A program of NIH’s National Center
for Advancing Translational Sciences
● These four data models are commonly used by
academic medical centers throughout the US.
● CDMs are used to store EHR data in a
consistent way.
● Sites participating in N3C may send data in one
of these four formats—the idea is to make it
as convenient as possible for sites to submit.
● Common data models also allow us to write a
consistent computable phenotype that can be
run with few local changes at sites with one or
more of these data models.
Harmonization of N3C Data
A program of NIH’s National Center
for Advancing Translational Sciences
Data Availability vs Utility
A program of NIH’s National Center
for Advancing Translational Sciences
● Collections of data are not always useful
● Even if they are available
● Consistently classified data is
alway more useful
FAIR: Findable, Accessible,
Interoperable, Reusable
A program of NIH’s National Center
for Advancing Translational Sciences
What does Interoperable mean with respect to data? Harmonized!
Syntactic Interoperability (harmonization)
● One can make sense of the structure
● Metaphor: sentence has good grammar
● Domain of the data standards and data model communities
Semantic interoperability (harmonization)
● One can make sense of the meaning
● Metaphor: the words are understandable
● Domain of the vocabulary, ontology, classification communities
N3C Data Ingestion & Harmonization Pipeline
A program of NIH’s National Center
for Advancing Translational Sciences
(future)
Span manual
curation of mapping
resources to
industrial scale
production
transformation
Harmonized, not Homogenous
A program of NIH’s National Center
for Advancing Translational Sciences
CDMs are built for purpose. Different CDMs emphasize and prioritize different things.
Secure, reproducible, transparent, versioned, provenanced, attributed,
and shareable analytics on patient-level EHR data
Collaborative
Analytics -
N3C Secure
Data Enclave
Federated versus Centralized DQ
A program of NIH’s National Center
for Advancing Translational Sciences
Many clinical data research networks are federated; N3C is centralized. Centralized datasets
have some advantages where data quality assessment is concerned.
Federated Network Centralized Data
Questions asked
directly against
all sites’ data
combined
Federated versus Centralized DQ
A program of NIH’s National Center
for Advancing Translational Sciences
With federated data, sites are benchmarked against
themselves.
With centralized data, sites can be benchmarked
against each other.
We have 43
qualifying
inpatient
visits.
We have 27
qualifying
inpatient
visits.
We have 806
qualifying
inpatient
visits.
Site 1 Site 2 Site 3
Site Patient Visit Type Adm. Date Disc. Date
1 123 IP 7/4/2020 7/8/2020
1 456 IP 5/6/2020 5/20/2020
2 987 IP 8/2/2019 8/7/2019
2 654 IP 9/3/2019 9/14/2019
3 234 IP 1/26/2021 1/26/2021
3 234 IP 1/26/2021 1/29/2021
3 234 IP 1/26/2021 1/30/2021
3 234 IP 1/26/2021 1/27/2021
Clearly, sites differ in how they define “a visit.”
N3C’s DQ Process
A program of NIH’s National Center
for Advancing Translational Sciences
How Would N3C Deal with This Finding?
● Discover and discuss at weekly DQ meetings.
● Determine: Is this an issue…
○ For the site to fix?
○ For us to handle on our end?
● Reach out to the site to get more information.
○ What if they can’t fix it?
Site Patient Visit Type Adm. Date Disc. Date
1 123 IP 7/4/2020 7/8/2020
1 456 IP 5/6/2020 5/20/2020
2 987 IP 8/2/2019 8/7/2019
2 654 IP 9/3/2019 9/14/2019
3 234 IP 1/26/2021 1/26/2021
3 234 IP 1/26/2021 1/29/2021
3 234 IP 1/26/2021 1/30/2021
3 234 IP 1/26/2021 1/27/2021
N3C’s DQ Process
A program of NIH’s National Center
for Advancing Translational Sciences
How Would N3C Deal with This Finding?
● Discover and discuss at weekly DQ meetings.
● Determine: Is this an issue…
○ For the site to fix?
○ For us to handle on our end?
● Reach out to the site to get more information.
○ What if they can’t fix it?
We can write an algorithm to make this
site’s visits look more like the other sites:
if:
● the visit type is inpatient
● and there are > 1 per patient
per day
then:
● merge into a single “macro”
visit
Site Patient Visit Type Adm. Date Disc. Date
1 123 IP 7/4/2020 7/8/2020
1 456 IP 5/6/2020 5/20/2020
2 987 IP 8/2/2019 8/7/2019
2 654 IP 9/3/2019 9/14/2019
3 234 IP 1/26/2021 1/26/2021
3 234 IP 1/26/2021 1/29/2021
3 234 IP 1/26/2021 1/30/2021
3 234 IP 1/26/2021 1/27/2021
N3C’s DQ Process
A program of NIH’s National Center
for Advancing Translational Sciences
Site Patient Visit Type Adm. Date Disc. Date
1 123 IP 7/4/2020 7/8/2020
1 456 IP 5/6/2020 5/20/2020
2 987 IP 8/2/2019 8/7/2019
2 654 IP 9/3/2019 9/14/2019
3 234 IP 1/26/2021 1/26/2021
3 234 IP 1/26/2021 1/29/2021
3 234 IP 1/26/2021 1/30/2021
3 234 IP 1/26/2021 1/27/2021
Site Patient Visit Type Adm. Date Disc. Date
1 123 IP 7/4/2020 7/8/2020
1 456 IP 5/6/2020 5/20/2020
2 987 IP 8/2/2019 8/7/2019
2 654 IP 9/3/2019 9/14/2019
3 234 IP 1/26/2021 1/30/2021
DQ fix
Takeaways
● Centralized DQ processes allow us to fully
realize the potential of N3C’s large sample size.
● All transformations are fully logged and always
completely reversible if needed.
Original Table Ready for Analysis
N3C Data Ingestion & Harmonization Pipeline
A program of NIH’s National Center
for Advancing Translational Sciences
(future)
Harmonizing numeric data
A program of NIH’s National Center
for Advancing Translational Sciences
● Problem: Different sites provide their
data in different units
● Solution: Harmonize each to a standard
unit
Kilograms = Pounds / 2.20462
Kilograms = Ounces / 35.274
Kilograms = Grams / 1000
Harmonizing numeric data
A program of NIH’s National Center
for Advancing Translational Sciences
● Problem: Some units are missing
● Solution 1: Contact the source
● Solution 2: N3C inference engine
Kilograms = x / 2.20462 ?
Kilograms = x / 35.274 ?
Kilograms = x / 1000 ?
Harmonization progress
A program of NIH’s National Center
for Advancing Translational Sciences
● Harmonized measurements
○ By original unit
○ Across many sites
Homogeneity
after
harmonization
Humans measured in grams do not
look the same as humans measured
in kilograms!
Unit harmonization progress
A program of NIH’s National Center
for Advancing Translational Sciences
Canonical unit
Uses a known conversion
Unit not plausible
Missing unit inferred
Unit still missing
● ~2x increase in usable data from our
harmonization procedures
We can rescue
a lot of data!
N3C Data Ingestion & Harmonization Pipeline
A program of NIH’s National Center
for Advancing Translational Sciences
(future)
7
141
40
Pharyngalgia = Sore throat
Plain-language medical vocabulary for precision
diagnosis. Nat Genet. 2018 50:474-476.
Long-COVID phenotypes are myriad
patient-reported and researcher-measured phenotypes are starkly different
Map literature and patient-
reported terms to HPO
N3C Harmonization Takeaways
A program of NIH’s National Center
for Advancing Translational Sciences
What N3C has revealed most in terms of needs:
● Interoperability - we need syntactic and semantic!
○ FHIR ⇒ OMOP (syntactic)
○ Common vocabulary/codeset mapping provenance
and management (semantic)
● Approach data harmonization from an end-to-end data
life cycle perspective
● Leverage USCDI, but build for
interoperable semantic modeling
and extensions
Governing N3C Data
A program of NIH’s National Center
for Advancing Translational Sciences
Goal of the Data Use Agreement is Privacy Protection
to Promote broad access:
● COVID-Related research only
● NIH housed secure repository
● No re-identification of individuals or data source
● No download or capture of raw data
● Open platform to all researchers
● Investigator activities are recorded and can be
audited for security and reproducibility
N3C: Unique Data Use and Privacy
A program of NIH’s National Center
for Advancing Translational Sciences
N3C: Governance and Access
Data Levels to Access
Goal of the Data Use Agreement is Privacy Protection to Promote broad access:
● COVID-Related research only
● No re-identification of individuals or data source
● No download or capture of raw data
● Open platform to all researchers
● Security: Activities in the N3C Data Enclave are recorded and can be audited
● Disclosure of research results to the N3C Data Enclave for the public good
● Analytics provenance
● Contributor Attribution tracking
Data Use and Privacy
● Transparent and collaborative environment where all contributions are acknowledged
● Provenance and reproducibility
● Promptly sharing research results with N3C users
● Publish in high-impact journals
● Attribution for all N3C artifacts
N3C Attribution and Publication Principles
Researchers, projects, and
artifacts are all linked
together in the enclave
using the Contributor
Attribution Model (CAM).
N3C Provenance, Transparency,
Attribution & Rapid Sharing
A program of NIH’s National Center
for Advancing Translational Sciences
N3C Data Access: Process
Data Use Request
HSP / Security Training
Data Use
Agreement
https://ncats.nih.gov/n3c/about/applying-for-access
A program of NIH’s National Center
for Advancing Translational Sciences
Realizing Team Science
A program of NIH’s National Center
for Advancing Translational Sciences
Key functions can
nucleate projects:
● Education & training
● Biostatistics
● Study design
● Evaluation
● Informatics
● Clinical expertise
● Innovation &
commercialization
● Community &
partnerships
N3C Domain Team Expertise:
● Enclave technology
● Data model (OMOP)
● Terminologies
● Data quality
● Codesets, variables,
phenotype
● Using/parsing N3C data
● Workflows, methods,
algorithms
Roles
Ingredients (Methods, datasets, instruments)
Scientific questions
N3C team Science within & across institutions
https://covid.cd2h.org/domain-teams
CTSAs
OUTCOMES OF COVID-19 IN
CANCER PATIENTS: REPORT
FROM THE NATIONAL COVID
COHORT COLLABORATIVE
(N3C)
Noha Sharafeldin, Benjamin Bates, Qianqian Song, Vithal Madhira, Yao
Yan, Sharlene Dong, Eileen Lee, Nathaniel Kuhrt, Yu Raymond Shao,
Feifan Liu, Timothy Bergquist, Justin Guinney, Jing Su, Umit Topaloglu
on behalf of the N3C Consortium
Given on June 4, 2021
https://covid.cd2h.org/ cd2h.slack.com @data2health
N3C Oncology Domain Team (ODT)
60
Noha Sharafeldin, MBBCh, PhD
Benjamin Bates, MD
Rutgers University
Umit Topaloglu, PhD
Wake Forest
University
Noha Sharafeldin, MD, PhD
The University of Alabama at
Birmingham
Leadership
https://covid.cd2h.org/oncology
Slack channel: #n3c-tt-oncology
N3C ODT Expertise
61
Noha Sharafeldin, MBBCh, PhD
Noha Sharafeldin
Informatics Biostatistics Clinical Epidemiology N3C data and Logic
Umit Topaloglu Jing Su Benjamin Bates Justin Guinney Vithal Madhira Tim Bergquist
Feifan Liu Qianqian Song Yu Raymond Shao Nate Kuhrt Sharlene Dong Eileen Lee
Yao Yan
N3C Oncology
A program of NIH’s National Center
for Advancing Translational Sciences
http://ascopubs.org/doi/full/10.1200/JCO.21.01074
N3C Cancer Cohort
Primary Diagnosis
63
Noha Sharafeldin, MBBCh, PhD
N3C Cancer Cohort
64
Noha Sharafeldin, MBBCh, PhD
Primary Outcome
• All- cause mortality
Secondary Outcomes
(Clinical severity indicators
requiring hospitalization)
• Mechanical Ventilation
65
Insert Name
(Insert > Header & Footer > Apply to All)
Demographic, clinical, and tumor characteristics
9
Noha Sharafeldin, MBBCh, PhD
2%
13%
31%
54%
Age
18-29
30-49
50-64
65+
COVID-19 Positive
4%
13%
61%
22%
Race
Hispanic
Non-Hispanic Black
Non-Hispanic White
Other or Unknown
51%
49%
Sex
Female
Male
11%
34%
28%
5%
22%
Geographical Location
US-Northeast
US-Midwest
US-South
US-West
Unknown
66
Insert Name
(Insert > Header & Footer > Apply to All)
Demographic, clinical, and tumor characteristics
10
Noha Sharafeldin, MBBCh, PhD
COVID-19 Positive
86%
14%
Smoking status
Non-smoker
Current or
Former smoker
41%
16%
9%
6%
28%
0
2000
4000
6000
8000
10000
12000
14000
16000
18000
0 1 2 3 ≥4
ADJUSTED CCI
67
Insert Name
(Insert > Header & Footer > Apply to All)
Demographic, clinical, and tumor characteristics
11
Noha Sharafeldin, MBBCh, PhD
COVID-19 Positive
15%
14%
12%
12%
9%
11%
0 1000 2000 3000 4000 5000 6000 7000
SKIN CANCERS
BREAST CANCER
PROSTATE CANCER
HEMATOLOGICAL CANCERS
GASTROINTESTINAL CANCERS
MULTI-SITE
Type of primary malignancy
71%
12%
11%
3% 3%
Solid
Liquid
Multi-Site
Unknown
Undefined Primary
COVID-19 Treatment
68
Noha Sharafeldin, MBBCh, PhD
COVID-19 Treatment (Yes) COVID positive (n=38,614)
Systemic antibiotics
Systemic steroids
Azithromycin
Remdesivir
Dexamethasone
Hydroxychloroquine (HCQ)
4032(15.75%)
3514(13.73%)
1197(4.68%)
1047(4.09%)
1029(4.02%)
364(1.42%)
Death and invasive ventilation in hospitalized patients
69
Noha Sharafeldin, MBBCh, PhD
Outcome COVID positive
(n=19,515)
COVID negative
(n=184,988)
Death
Invasive Ventilation
2,894 (14.8%)
1,606 (8.2%)
23,207 (12.5%)
9,576 (5.2%)
Survival Probability –
by COVID status
70
Noha Sharafeldin, MBBCh, PhD
HR = 1.20 (95%CI: 1.15 – 1.24, p<0.001)
Survival Probability by
cancer type among
COVID positive patients
71
Noha Sharafeldin, MBBCh, PhD
Hazard ratios associated with 1-year all-cause
mortality among COVID-positive patients
72
Noha Sharafeldin, MBBCh, PhD
Hazard ratios associated with 1-year all-cause
mortality among COVID-positive patients
73
Noha Sharafeldin, MBBCh, PhD
Hazard ratios associated with 1-year all-cause
mortality among COVID-positive patients
74
Noha Sharafeldin, MBBCh, PhD
Hazard ratios associated with 1-year all-cause
mortality among COVID-positive patients
75
Noha Sharafeldin, MBBCh, PhD
76
Noha Sharafeldin, MBBCh, PhD
Limitations
• RWD Challenges (e.g. data missingness)
• Limited capture of recent cancer therapy
• Potential misclassification of cancer patients
• Challenges in primary cancer diagnosis mapping and limited
historical data
• Method for construction of COVID-19 negative control
77
Noha Sharafeldin, MBBCh, PhD
Conclusions
• N3C represents a unique resource to examine effects of COVID-19 on cancer outcomes
• Largest COVID-19 and cancer cohort within the US
• Consistent with previous literature, older age, male gender, increasing comorbidities,
and hematological malignancies were associated with higher mortality in patients with
cancer and COVID-19
• The N3C dataset confirmed that cancer patients with COVID-19 who received recent
immuno-, or targeted therapies were not at higher risks of overall mortality
78
Noha Sharafeldin, MBBCh, PhD
Acknowledgements
The Patients
US Data Partners
N3C Consortial Authors
Christopher Chute
Melissa Haendel
Amit Mitra
Ramakanth Kavuluru
NCATS U24 TR002306
NIGMS 5U54GM104942-04
NCI P30CA012197 [UT, QS]
LLS 3386-19 [NS]
Indiana University Precision Health
Initiative [JS]
N3C Core Teams
79
Noha Sharafeldin, MBBCh, PhD
Acknowledgements
We gratefully acknowledge contributions from the following N3C core teams:
• Principal Investigators: Melissa A. Haendel*, Christopher G. Chute*, Kenneth R. Gersing, Anita Walden
• Workstream, subgroup and administrative leaders: Melissa A. Haendel*, Tellen D. Bennett, Christopher G. Chute, David A. Eichmann, Justin Guinney, Warren A.
Kibbe, Hongfang Liu, Philip R.O. Payne, Emily R. Pfaff, Peter N. Robinson, Joel H. Saltz, Heidi Spratt, Justin Starren, Christine Suver, Adam B. Wilcox, Andrew E.
Williams, Chunlei Wu
• Key liaisons at data partner sites
• Regulatory staff at data partner sites
• Individuals at the sites who are responsible for creating the datasets and submitting data to N3C
• Data Ingest and Harmonization Team: Christopher G. Chute*, Emily R. Pfaff*, Davera Gabriel, Stephanie S. Hong, Kristin Kostka, Harold P. Lehmann, Richard A.
Moffitt, Michele Morris, Matvey B. Palchuk, Xiaohan Tanner Zhang, Richard L. Zhu
• Phenotype Team (Individuals who create the scripts that the sites use to submit their data, based on the COVID and Long COVID definitions): Emily R. Pfaff*,
Benjamin Amor, Mark M. Bissell, Marshall Clark, Andrew T. Girvin, Stephanie S. Hong, Kristin Kostka, Adam M. Lee, Robert T. Miller, Michele Morris, Matvey B.
Palchuk, Kellie M. Walters
• Project Management and Operations Team: Anita Walden*, Yooree Chae, Connor Cook, Alexandra Dest, Racquel R. Dietz, Thomas Dillon, Patricia A. Francis, Rafael
Fuentes, Alexis Graves, Julie A. McMurry, Andrew J. Neumann, Shawn T. O'Neil, Andréa M. Volz, Elizabeth Zampino
• Partners from NIH and other federal agencies: Christopher P. Austin*, Kenneth R. Gersing*, Samuel Bozzette, Mariam Deacy, Nicole Garbarini, Michael G. Kurilla,
Sam G. Michael, Joni L. Rutter, Meredith Temple-O'Connor
• Analytics Team (Individuals who build the Enclave infrastructure, help create codesets, variables, and help Domain Teams and project teams with their datasets):
Benjamin Amor*, Mark M. Bissell, Katie Rebecca Bradwell, Andrew T. Girvin, Amin Manna, Nabeel Qureshi
• Publication Committee Management Team: Mary Morrison Saltz*, Christine Suver*, Christopher G. Chute, Melissa A. Haendel, Julie A. McMurry, Andréa M. Volz,
Anita Walden
• Publication Committee Review Team: Carolyn Bramante, Jeremy Richard Harper, Wenndy Hernandez, Farrukh M Koraishy, Federico Mariona, Saidulu Mattapally,
Amit Saha, Satyanarayana Vedula
N3C Registration/Training
https://covid.cd2h.org/tutorials
Training Office Hours:
Tuesdays & Thursdays at 10-11 am PT/1-2 pm ET
Registration Required at this link
Orientation Video Coming Soon
Additional Training Tutorials available in the Enclave
Registration for Documents,
Meetings & the N3C Data Enclave
Requires Authentication
Enclave Checklist
A program of NIH’s National Center
for Advancing Translational Sciences
● N3C comprises the largest, most representative patient-level COVID-19
cohort in the US and continues to grow
● We CAN do transparent, reproducible, innovative science (including ML)
on sensitive observational data at scale, together!
● N3C is an innovative partnership between clinical sites, CDM
communities, NIH ICs, CD2H, and commercial partners
● Automation of data extraction and minimum requirements reduces
burden and increases site participation
● Robust attribution of all contributors; also provides great venue for
trainees
● N3C data is complicated, but there are many people and resources to
help users do good science
Step 4. Federated Analytics with HPC
Takeaways
A program of NIH’s National Center
for Advancing Translational Sciences
Register with N3C: https://labs.cd2h.org/registration/
Joining Workstreams:
N3C Data Ingestion & Harmonization Workstream
Slack Channel Harmonization
Google Group Harmonization
N3C Phenotype & Data Acquisition Workstream
Slack Channel Phenotype
Google Group Phenotype
N3C Collaborative Analytics Workstream
Slack Channel Analytics
Google Group Analytics
N3C Data Partnership & Governance Workstream
Slack Channel Governance
Google Group Governance
N3C Synthetic Clinical Data Workstream
Slack Channel Synthetic
Google Group Synthetic
N3C Implementation Workstream- Coming soon
Additional Information:
Onboarding N3C, Slack, Google | Finding and Joining a Google Group
NCATS N3C Webpage N3C Website
How to Get Involved with N3C
A program of NIH’s National Center
for Advancing Translational Sciences
Melissa A. Haendel,1,4,7,8,10,13,14,52,78,101 Christopher G. Chute,1,4,8,10,13,14,52,78,100,101 Tellen D. Bennett,9,10,13,14,52,100,101 David A. Eichmann,4,9,10,13,78,101 Justin
Guinney,4,9,10,14,78,101 Warren A. Kibbe,9,10,52,78,101 Philip R.O. Payne,4,9,10,78,101 Emily R. Pfaff,9,10,13,15,52,78 Peter N. Robinson,4,9,10,15,52,78,100 Joel H.
Saltz,10,13,14,15,52,78,101 Heidi Spratt,9,10,100 Christine Suver,10,78,101 John Wilbanks,10,78,101 Adam B. Wilcox,10,101 Andrew E. Williams,10,13,78 Chunlei Wu,9,13,14,78
Clair Blacketer,15,52 Robert L. Bradford,9,52 James J. Cimino,10,14,101 Marshall Clark,9,15,52 Evan W. Colmenares,9,15,52 Patricia A. Francis,78 Davera
Gabriel,9,10,13,14,15,52 Alexis Graves,7,9,78 Raju Hemadri,9,15,52 Stephanie S. Hong,9,15,52 George Hripscak,10,52 Dazhi Jiao,9,15,52 Jeffrey G. Klann,14,52,101 Kristin
Kostka,9,15,52 Adam M. Lee,9,15,52 Harold P. Lehmann,9,15,52 Lora Lingrey,9,15,52 Robert T. Miller,9,15,52 Michele Morris,9,15,52 Shawn N. Murphy,9,15,52 Karthik
Natarajan,9,15,52 Matvey B. Palchuk,9,15,52 Usman Sheikh,9,78 Harold Solbrig,9,15,52 Shyam Visweswaran,10,15,52,101 Anita Walden,7,10,13,14,52,101 Kellie M.
Walters,10,14,101 Griffin M. Weber,10,101 Xiaohan Tanner Zhang,9,15,52 Richard L. Zhu,9,15,52 Benjamin Amor,78 Andrew T. Girvin,15,78 Amin Manna,78 Nabeel
Qureshi,15,78 Michael G. Kurilla,10,78 Sam G. Michael,10,78 Lili M. Portilla,101 Joni L. Rutter,1,101 Christopher P. Austin,101 Ken R. Gersing,78,101
Shaymaa Al-Shukri,4,15 Adil Alaoui,101 Ahmad Baghal,15 Pamela D. Banning,15,100 Edward M. Barbour,8,15 Michael J. Becich,15,52,101 Afshin Beheshti,14 Gordon R. Bernard,8,15 Sharmodeep Bhattacharyya,100 Mark
M. Bissell,9,15 L. Ebony Boulware,14,100 Samuel Bozzette,100,101 Donald E. Brown,101 John B. Buse,14 Brian J. Bush,8,101 Tiffany J. Callahan,14,52 Thomas R. Campion,8,15 Elena Casiraghi,9,15 Ammar A.
Chaudhry,13,14 Guanhua Chen,9 Anjun Chen,13 Gari D. Clifford,8,15 Megan P. Coffee,14,100 Tom Conlin,14 Connor Cook,7,78 Keith A. Crandall,9,14,101 Mariam Deacy,78 Racquel R. Dietz,78 Nicholas J. Dobbins,8,9
Peter L. Elkin,15,52,100 Peter J. Embi,52,101 Julio C. Facelli,8,15 Karamarie Fecho,13 Xue Feng,9 Randi E. Foraker,8,13,15 Tamas S. Gal,8,15 Linqiang Ge,14 George Golovko,15,101 Ramkiran Gouripeddi,14,15 Casey S.
Greene,13,14 Sangeeta Gupta,52,101 Ashish Gupta,13,101 Janos G. Hajagos,9,15 David A. Hanauer,15,52 Jeremy Richard Harper,9,14,52 Nomi L. Harris,14 Paul A. Harris,101 Mehadi R. Hassan,9 Yongqun He,15,52,100
Elaine L. Hill,9,14 Maureen E. Hoatlin,14 Kristi L. Holmes,4,101 LaRon Hughes,14 Randeep S. Jawa,14 Guoqian Jiang,14 Xia Jing,7,14 Marcin P. Joachimiak,8,15 Steven G. Johnson,9,14,101 Rishikesan
Kamaleswaran,9,15,78 Thomas George Kannampallil,15,101 Andrew S. Kanter,15,52 Ramakanth Kavuluru,9,13,14 Kamil Khanipov,8,14 Hadi Kharrazi,9,14 Dongkyu Kim,15,52 Boyd M. Knosp,8,15 Arunkumar Krishnan,9
Tahsin Kurc,9,15 Albert M. Lai,101 Christophe G. Lambert,52,101 Michael Larionov,14 Stephen B. Lee,1,14 Michael D. Lesh,9 Olivier Lichtarge,14 John Liu,9 Sijia Liu,8,9,101 Hongfang Liu,9,15 Johanna J. Loomba,1,15,78,101
Sandeep K. Mallipattu,9,14,15 Chaitanya K. Mamillapalli,14 Christopher E. Mason,15 Jomol P. Mathew,8,15,52 James C. McClay,101 Julie A. McMurry,1,4,7,9,13,14,78 Paras P. Mehta,14 Ofer Mendelevitch,9 Stephane
Meystre,8,14,15 Richard A. Moffitt,9,13,15 Jason H. Moore,8,9 Hiroki Morizono,13,14,15,52 Christopher J. Mungall,15,52 Monica C. Munoz-Torres,7,10,78 Andrew J. Neumann,78 Xia Ning,14 Jennifer E. Nyland,13,14 Lisa
O'Keefe,78 Anna O'Malley,78 Shawn T. O'Neil,78 Jihad S. Obeid,10,14,15 Elizabeth L. Ogburn,13 Jimmy Phuong,9,15,52,100,101 Jose D Posada,8,15 Prateek Prasanna,14,52 Fred Prior,9,14,15 Justin Prosser,9,78 Amanda
Lienau Purnell,101 Ali Rahnavard,9,52 Harish Ramadas,9,52,78 Justin T. Reese,9,10 Jennifer L. Robinson,14,100 Daniel L. Rubin,101 Cody D. Rutherford,9,101 Eugene M. Sadhu,8,15 Amit Saha,9 Mary Morrison
Saltz,15,52,101 Thomas Schaffter,78 Titus KL Schleyer,14 Soko Setoguchi,8,14,15 Nigam H. Shah,8,14 Noha Sharafeldin,14 Evan Sholle,15,52 Jonathan C. Silverstein,15,52,101 Anthony Solomonides,101 Julian Solway,14,101
Jing Su,101 Vignesh Subbian,9,52,101 Hyo Jung Tak,15 Bradley W. Taylor,9,14 Anne E. Thessen,14,101 Jason A. Thomas,15 Umit Topaloglu,15,52 Deepak R. Unni,8,9,15,52 Joshua T. Vogelstein,14 Andréa M. Volz,7 David
A. Williams,14,15 Kelli M. Wilson,9,78 Clark B. Xu,8,9,15 Hua Xu,9,10,14 Yao Yan,9,15,52 Elizabeth Zak,8,15 Lanjing Zhang,101 Chengda Zhang,14 Jingyi Zheng,14
1CREDIT_00000001 (Conceptualization)4CREDIT_00000004 (Funding acquisition)7CRO_0000007 (Marketing and Communications)8CREDIT_00000008 (Resources)9CREDIT_00000009 (Software role)10CREDIT_00000010
(Supervision role)13CREDIT_00000013 (Original draft)14CREDIT_00000014 (Review and editing)15CRO_0000015 (Data role)52CRO_0000052 (Standards role)78CRO_0000078 (Infrastructure role)100Clinical Use Cases101Governance
https://academic.oup.com/jamia/advance-
article/doi/10.1093/jamia/ocaa196/5893482
Questions or Comments?
Thank you!
Thank you!
A program of NIH’s National Center
for Advancing Translational Sciences

Real world data, the National COVID-19 Cohort Consortium, and Oncology 2021

  • 1.
    The National COVIDCohort Collaborative (N3C): Let’s Get Involved ! Warren A. Kibbe, PhD, FACMI June 15, 2021 Purdue Big Data in Cancer Workshop @data2health @ncats_nih_gov covid.cd2h.org ncats.nih.gov/n3c @wakibbe
  • 2.
    Speaker Objectives Warren Kibbe DukeBiostatistics & Bioinformatics CTSA Informatics Duke Cancer Institute Member N3C ● Real World Data ● Open Science ● Overview of N3C ● N3C Data Enclave statistics ● How common data models and variables are harmonized ● The scope of answerable questions ● Data access and security ● How common data models and variables are harmonized ● Oncology research in N3C A program of NIH’s National Center for Advancing Translational Sciences
  • 3.
    Special thanks to: ●Chris Chute, N3C, Johns Hopkins ● Melissa Haendel, N3C, Colorado University ● Umit Topaloglu, N3C, Wake Forest ● Frank Rockhold, Duke ● Noha Sharafeldin, N3C, UAB
  • 4.
    4 Take homes • N3Crepresents a unique resource to examine effects of COVID-19 on cancer outcomes • Largest COVID-19 and cancer cohort within the US • Consistent with previous literature, older age, male gender, increasing comorbidities, and hematological malignancies were associated with higher mortality in patients with cancer and COVID-19 • The N3C dataset confirmed that cancer patients with COVID-19 who received recent immuno-, or targeted therapies were not at higher risks of overall mortality
  • 5.
    What is RealWorld Data? Collected in the context of patient care. Real World Data was called out as part of the 21st Century Cures Act 21st Century Cures Act: https://www.fda.gov/regulatory-information/selected-amendments-fdc-act/21st-century-cures-act Graphic from HealthCatalyst: https://www.healthcatalyst.com/insights/real-world-data-chief-driver-drug-development
  • 6.
    Our ability togenerate biomedical data continues to grow in terms of variety and volume Current sources of data molecular genome pathology imaging labs notes sensors icons by the Noun Project
  • 7.
    AI is changingour ability to go both deep and broad Trustworthy AI Provenance Reusable Reproducible
  • 8.
    Having a healthequity lens ● Digital Health, precision medicine, and real world data all have the power to transform healthcare. However, we must pay attention to structural racism and implicit bias if we want to achieve equity.
  • 9.
    21st Century CuresAct Last year I discussed the NCI Cancer Moonshot and Precision Medicine activities funded under the 21st Century Cures Act FDA was directed by congress to focus on the use of RWD and RWE in drug design, development and outcomes assessment https://www.fda.gov/regulatory-information/selected- amendments-fdc-act/21st-century-cures-act
  • 10.
    Is it justabout Real World Data? What about Open Science? Data transparency? Data Access?
  • 11.
    The importance ofOpen Science Calls for greater transparency and ‘open data access’ in clinical research continue actively. ● “Open science is the movement to make scientific research, data and dissemination accessible to all levels of an inquiring society”* ● Open Science Project**: “If we want open science to flourish, we should raise our expectations to: Work. Finish. Publish. Release.” ● FAIR Principles: Findability, Accessibility, Interoperability, and Reusability*** ● TRUST Principles: Transparency, Responsibility, User focus, Sustainability and Technology * https://www.fosteropenscience.eu/resources ** http://openscience.org/ *** https://www.nature.com/articles/sdata201618 ****https://www.nature.com/articles/s41597-020-0486-7
  • 12.
    Open Science andPatient Data Access Some of the challenges are: ● Patient privacy ● Academic credit ● Commercial sensitivity and intellectual property ● Data standards ● Resources (money and people) There should be room for researchers and patients alike to gain from this effort. Informatics experts and data scientists are essential elements of this discussion.
  • 13.
    One problem withClinical Trials Data Sharing ● “The tendency for researchers to ‘‘sit’’ on their data for an unduly long period of time is neither desirable from a scientific point of view nor acceptable from an ethical perspective. ‘ ● ‘After all, the data belong to the patients who agreed to participate in the research, not to the investigators who coordinated it, as the new European General Data Protection Regulation emphasizes.”* *Rockhold, F, et al. Open science: The open clinical trials data journey, Clinical Trials, Vol 16 (5) 1-8, 2019
  • 14.
    Access to patient-leveldata is important for research There are certainly challenges, but question is not whether data should be shared, but rather how and when access should be granted. Responsible open access enables secondary analyses that: ● Enhance reproducibility of clinical research ● Honor the contributions of trial participants, ● Improve the design of future trials ● Generate new research findings This journey of making patient data available is part of an evolution in transparency and not a sudden awakening.
  • 15.
    What about N3C? Itis an open science, controlled access environment
  • 16.
    Clinical and TranslationalScience Awards (CTSA) Program
  • 17.
    ● Algorithms (diagnosis,triage, predictive, etc.) ● Drug discovery & pharmacogenetics ● Multimodal analytics (EHR, imaging, genomics) ● Interventions that reduce disease severity ● Best practices for resource allocation ● Coordinated research efforts to maximize efficiency and reproducibility These all require the creation of a comprehensive clinical data set The pandemic highlights urgent needs A program of NIH’s National Center for Advancing Translational Sciences
  • 18.
    What Kinds ofQuestions Can N3C Address? The scope and scale of the information in the platform will support probing questions such as: ● What social determinants of health are risk factors for mortality? ● Do some therapies work better than others? By region? By demographics? ● Can we compare local rare clinical observations with national occurrences? ● Can we predict who might have severe outcomes if they have COVID-19? ● What factors will predict the effectiveness of vaccines? ● Can we predict acute kidney injury in COVID-19 patients? ● Who might need a ventilator because of lung failure? A program of NIH’s National Center for Advancing Translational Sciences
  • 19.
    Cohort characterization objectives Toclinically characterize the N3C cohort ● Largest U.S. COVID-19 cohort to date (+ representative controls) ● Racially, ethnically, and geographically diverse To develop and share validated, versioned OMOP representations of common variables (labs, vital signs, medications, treatments) To generate hypotheses to be tested within N3C and elsewhere ● Clinical phenotypes and trajectories ● Treatment patterns and response ● … and many others ? + A program of NIH’s National Center for Advancing Translational Sciences
  • 20.
    Benefits for Participation ●Accessto large scale COVID-19 data from across the nation ●Pilot data for grant proposals ●Opportunities for KL2 and TL1 and other scholars ●Team science opportunities for new questions and access to Teams, statistics, machine learning (ML), informatics expertise ●Learn ML analytics, NLP methods & access to tools, software, additional datasets A program of NIH’s National Center for Advancing Translational Sciences
  • 21.
    Step 4. FederatedAnalytics with HPC Who is in the N3C? The N3C Computable Phenotype ● At a high level, our phenotype looks for patients: ○ With a positive COVID-19 test (PCR or antibody) OR ○ With an ICD-10-CM code of U07.1 OR ○ Two or more COVID-like diagnosis codes (ARDS, pneumonia, etc.) during the same encounter, but only on or prior to 5/1/2020 ● Each one of these patients is then demographically matched to two patients with negative or equivocal COVID-19 tests. ● Each site securely sends this set of patients, along with their longitudinal EHR data from 1/1/2018 to the present, to the N3C on a regular basis. Age 47 Gender M Race Black Ethnicit y Unknow n COVID Positive Matching algorithm Age 49 Gender M Race Black Ethnicit y Hispanic/ Latino COVID Negative Age 46 Gender M Race Black Ethnicit y Not Hispanic COVID Negative A program of NIH’s National Center for Advancing Translational Sciences
  • 22.
    N3C Timeline A programof NIH’s National Center for Advancing Translational Sciences
  • 23.
    N3C Dashboard A programof NIH’s National Center for Advancing Translational Sciences covid.cd2h.org/dashboard 55 sites with data released (purple) and 37 sites with data pending (open circle). OCHIN is a national network of 131 sites (diamond). covid.cd2h.org/teams 31 Domain teams! As of June 14, 2021
  • 24.
    https://ncats.nih.gov/n3c/resources/data-contribution/data-transfer-agreement-signatories Data Transfer AgreementSignatories 6/14/2021 88 DTA Signatories Northwestern University at Chicago ᛫ Tufts Medical Center ᛫ Advocate Health Care Network ᛫ University of Alabama at Birmingham ᛫ Oregon Health & Science University ᛫ University of Washington ᛫ Stanford University ᛫ The University of Michigan at Ann Arbor ᛫ Children's Hospital Colorado ᛫ Duke University ᛫ Medical College of Wisconsin ᛫ The Ohio State University ᛫ University of Nebraska Medical Center ᛫ University of Arkansas for Medical Sciences ᛫ George Washington University ᛫ Johns Hopkins University ᛫ West Virginia University ᛫ Medical University of South Carolina ᛫ University of North Carolina at Chapel Hill ᛫ University of Virginia ᛫ The University of Texas Medical Branch at Galveston ᛫ University of Minnesota ᛫ University of Cincinnati ᛫ Columbia University Irving Medical Center ᛫ Cincinnati Children's Hospital Medical Center ᛫ Rush University Medical Center ᛫ Nemours ᛫ University of Wisconsin-Madison ᛫ The State University of New York at Buffalo ᛫ Washington University in St. Louis ᛫ University of Rochester ᛫ The University of Chicago ᛫ University of Miami ᛫ The Scripps Research Institute ᛫ University of Texas Health Science Center at San Antonio ᛫ University of Kentucky ᛫ University of Illinois at Chicago ᛫ Virginia Commonwealth University ᛫ Weill Medical College of Cornell University ᛫ Carilion Clinic ᛫ University Medical Center New Orleans ᛫ The University of Iowa ᛫ Emory University ᛫ Maine Medical Center ᛫ The University of Texas Health Science Center at Houston ᛫ Boston University Medical Campus ᛫ The University of Utah ᛫ University of Southern California ᛫ George Washington Children's Research Institute ᛫ University of Colorado Denver I Anschutz Medical Campus ᛫ Mayo Clinic Rochester ᛫ The Rockefeller University ᛫ Montefiore Medical Center ᛫ University of Mississippi Medical Center ᛫ University of Oklahoma Health Sciences Center, Board of Regents ᛫ University of Massachusetts Medical School Worcester ᛫ Aurora Health Care ᛫ Penn State ᛫ University of New Mexico Health Sciences Center ᛫ NorthShore University HealthSystem ᛫ Wake Forest University Health Sciences ᛫ Vanderbilt University Medical Center ᛫ Regenstrief Institute ᛫ Brown University ᛫ Stony Brook University ᛫ University of California, Davis ᛫ Yale New Haven Hospital ᛫ Rutgers, The State University of New Jersey ᛫ MedStar Health Research Institute ᛫ Loyola University Chicago ᛫ Loyola University Medical Center ᛫ University of Delaware ᛫ Children's Hospital of Philadelphia
  • 25.
    N3C Enclave DataStats Pediatric cases A program of NIH’s National Center for Advancing Translational Sciences
  • 26.
    N3C Enclave DataStats Pediatric cases A program of NIH’s National Center for Advancing Translational Sciences
  • 27.
    N3C Enclave DataStats A program of NIH’s National Center for Advancing Translational Sciences
  • 28.
    Predicting Clinical Severityusing machine learning (64 input variables) The most powerful predictors are patient age and widely available vital sign and laboratory values. The National COVID Cohort Collaborative: Clinical Characterization and Early Severity Prediction https://pubmed.ncbi.nlm.nih.gov/33469592/
  • 29.
    Step 4. FederatedAnalytics with HPC How does data get into N3C? ● We have gone through the high-level purpose – EHR data about COVID-19 patients ● Identified the contributing sites ● Know what the inclusion criteria for N3C is – documented COVID-19 testing ● Seen the dashboard overview of N3C and the overall cohort characteristics ● What are the data ingestion, harmonization, query, and publication processes? ● Data governance and security? ● And finally, what about cancer and COVID-19? A program of NIH’s National Center for Advancing Translational Sciences
  • 30.
    Leveraging Common DataModels A program of NIH’s National Center for Advancing Translational Sciences ● These four data models are commonly used by academic medical centers throughout the US. ● CDMs are used to store EHR data in a consistent way. ● Sites participating in N3C may send data in one of these four formats—the idea is to make it as convenient as possible for sites to submit. ● Common data models also allow us to write a consistent computable phenotype that can be run with few local changes at sites with one or more of these data models.
  • 31.
    Harmonization of N3CData A program of NIH’s National Center for Advancing Translational Sciences
  • 32.
    Data Availability vsUtility A program of NIH’s National Center for Advancing Translational Sciences ● Collections of data are not always useful ● Even if they are available ● Consistently classified data is alway more useful
  • 33.
    FAIR: Findable, Accessible, Interoperable,Reusable A program of NIH’s National Center for Advancing Translational Sciences What does Interoperable mean with respect to data? Harmonized! Syntactic Interoperability (harmonization) ● One can make sense of the structure ● Metaphor: sentence has good grammar ● Domain of the data standards and data model communities Semantic interoperability (harmonization) ● One can make sense of the meaning ● Metaphor: the words are understandable ● Domain of the vocabulary, ontology, classification communities
  • 34.
    N3C Data Ingestion& Harmonization Pipeline A program of NIH’s National Center for Advancing Translational Sciences (future) Span manual curation of mapping resources to industrial scale production transformation
  • 35.
    Harmonized, not Homogenous Aprogram of NIH’s National Center for Advancing Translational Sciences CDMs are built for purpose. Different CDMs emphasize and prioritize different things.
  • 36.
    Secure, reproducible, transparent,versioned, provenanced, attributed, and shareable analytics on patient-level EHR data Collaborative Analytics - N3C Secure Data Enclave
  • 37.
    Federated versus CentralizedDQ A program of NIH’s National Center for Advancing Translational Sciences Many clinical data research networks are federated; N3C is centralized. Centralized datasets have some advantages where data quality assessment is concerned. Federated Network Centralized Data Questions asked directly against all sites’ data combined
  • 38.
    Federated versus CentralizedDQ A program of NIH’s National Center for Advancing Translational Sciences With federated data, sites are benchmarked against themselves. With centralized data, sites can be benchmarked against each other. We have 43 qualifying inpatient visits. We have 27 qualifying inpatient visits. We have 806 qualifying inpatient visits. Site 1 Site 2 Site 3 Site Patient Visit Type Adm. Date Disc. Date 1 123 IP 7/4/2020 7/8/2020 1 456 IP 5/6/2020 5/20/2020 2 987 IP 8/2/2019 8/7/2019 2 654 IP 9/3/2019 9/14/2019 3 234 IP 1/26/2021 1/26/2021 3 234 IP 1/26/2021 1/29/2021 3 234 IP 1/26/2021 1/30/2021 3 234 IP 1/26/2021 1/27/2021 Clearly, sites differ in how they define “a visit.”
  • 39.
    N3C’s DQ Process Aprogram of NIH’s National Center for Advancing Translational Sciences How Would N3C Deal with This Finding? ● Discover and discuss at weekly DQ meetings. ● Determine: Is this an issue… ○ For the site to fix? ○ For us to handle on our end? ● Reach out to the site to get more information. ○ What if they can’t fix it? Site Patient Visit Type Adm. Date Disc. Date 1 123 IP 7/4/2020 7/8/2020 1 456 IP 5/6/2020 5/20/2020 2 987 IP 8/2/2019 8/7/2019 2 654 IP 9/3/2019 9/14/2019 3 234 IP 1/26/2021 1/26/2021 3 234 IP 1/26/2021 1/29/2021 3 234 IP 1/26/2021 1/30/2021 3 234 IP 1/26/2021 1/27/2021
  • 40.
    N3C’s DQ Process Aprogram of NIH’s National Center for Advancing Translational Sciences How Would N3C Deal with This Finding? ● Discover and discuss at weekly DQ meetings. ● Determine: Is this an issue… ○ For the site to fix? ○ For us to handle on our end? ● Reach out to the site to get more information. ○ What if they can’t fix it? We can write an algorithm to make this site’s visits look more like the other sites: if: ● the visit type is inpatient ● and there are > 1 per patient per day then: ● merge into a single “macro” visit Site Patient Visit Type Adm. Date Disc. Date 1 123 IP 7/4/2020 7/8/2020 1 456 IP 5/6/2020 5/20/2020 2 987 IP 8/2/2019 8/7/2019 2 654 IP 9/3/2019 9/14/2019 3 234 IP 1/26/2021 1/26/2021 3 234 IP 1/26/2021 1/29/2021 3 234 IP 1/26/2021 1/30/2021 3 234 IP 1/26/2021 1/27/2021
  • 41.
    N3C’s DQ Process Aprogram of NIH’s National Center for Advancing Translational Sciences Site Patient Visit Type Adm. Date Disc. Date 1 123 IP 7/4/2020 7/8/2020 1 456 IP 5/6/2020 5/20/2020 2 987 IP 8/2/2019 8/7/2019 2 654 IP 9/3/2019 9/14/2019 3 234 IP 1/26/2021 1/26/2021 3 234 IP 1/26/2021 1/29/2021 3 234 IP 1/26/2021 1/30/2021 3 234 IP 1/26/2021 1/27/2021 Site Patient Visit Type Adm. Date Disc. Date 1 123 IP 7/4/2020 7/8/2020 1 456 IP 5/6/2020 5/20/2020 2 987 IP 8/2/2019 8/7/2019 2 654 IP 9/3/2019 9/14/2019 3 234 IP 1/26/2021 1/30/2021 DQ fix Takeaways ● Centralized DQ processes allow us to fully realize the potential of N3C’s large sample size. ● All transformations are fully logged and always completely reversible if needed. Original Table Ready for Analysis
  • 42.
    N3C Data Ingestion& Harmonization Pipeline A program of NIH’s National Center for Advancing Translational Sciences (future)
  • 43.
    Harmonizing numeric data Aprogram of NIH’s National Center for Advancing Translational Sciences ● Problem: Different sites provide their data in different units ● Solution: Harmonize each to a standard unit Kilograms = Pounds / 2.20462 Kilograms = Ounces / 35.274 Kilograms = Grams / 1000
  • 44.
    Harmonizing numeric data Aprogram of NIH’s National Center for Advancing Translational Sciences ● Problem: Some units are missing ● Solution 1: Contact the source ● Solution 2: N3C inference engine Kilograms = x / 2.20462 ? Kilograms = x / 35.274 ? Kilograms = x / 1000 ?
  • 45.
    Harmonization progress A programof NIH’s National Center for Advancing Translational Sciences ● Harmonized measurements ○ By original unit ○ Across many sites Homogeneity after harmonization Humans measured in grams do not look the same as humans measured in kilograms!
  • 46.
    Unit harmonization progress Aprogram of NIH’s National Center for Advancing Translational Sciences Canonical unit Uses a known conversion Unit not plausible Missing unit inferred Unit still missing ● ~2x increase in usable data from our harmonization procedures We can rescue a lot of data!
  • 47.
    N3C Data Ingestion& Harmonization Pipeline A program of NIH’s National Center for Advancing Translational Sciences (future)
  • 48.
    7 141 40 Pharyngalgia = Sorethroat Plain-language medical vocabulary for precision diagnosis. Nat Genet. 2018 50:474-476. Long-COVID phenotypes are myriad patient-reported and researcher-measured phenotypes are starkly different Map literature and patient- reported terms to HPO
  • 49.
    N3C Harmonization Takeaways Aprogram of NIH’s National Center for Advancing Translational Sciences What N3C has revealed most in terms of needs: ● Interoperability - we need syntactic and semantic! ○ FHIR ⇒ OMOP (syntactic) ○ Common vocabulary/codeset mapping provenance and management (semantic) ● Approach data harmonization from an end-to-end data life cycle perspective ● Leverage USCDI, but build for interoperable semantic modeling and extensions
  • 50.
    Governing N3C Data Aprogram of NIH’s National Center for Advancing Translational Sciences
  • 51.
    Goal of theData Use Agreement is Privacy Protection to Promote broad access: ● COVID-Related research only ● NIH housed secure repository ● No re-identification of individuals or data source ● No download or capture of raw data ● Open platform to all researchers ● Investigator activities are recorded and can be audited for security and reproducibility N3C: Unique Data Use and Privacy A program of NIH’s National Center for Advancing Translational Sciences
  • 52.
  • 53.
  • 54.
    Goal of theData Use Agreement is Privacy Protection to Promote broad access: ● COVID-Related research only ● No re-identification of individuals or data source ● No download or capture of raw data ● Open platform to all researchers ● Security: Activities in the N3C Data Enclave are recorded and can be audited ● Disclosure of research results to the N3C Data Enclave for the public good ● Analytics provenance ● Contributor Attribution tracking Data Use and Privacy
  • 55.
    ● Transparent andcollaborative environment where all contributions are acknowledged ● Provenance and reproducibility ● Promptly sharing research results with N3C users ● Publish in high-impact journals ● Attribution for all N3C artifacts N3C Attribution and Publication Principles Researchers, projects, and artifacts are all linked together in the enclave using the Contributor Attribution Model (CAM). N3C Provenance, Transparency, Attribution & Rapid Sharing A program of NIH’s National Center for Advancing Translational Sciences
  • 56.
    N3C Data Access:Process Data Use Request HSP / Security Training Data Use Agreement https://ncats.nih.gov/n3c/about/applying-for-access A program of NIH’s National Center for Advancing Translational Sciences
  • 57.
    Realizing Team Science Aprogram of NIH’s National Center for Advancing Translational Sciences
  • 58.
    Key functions can nucleateprojects: ● Education & training ● Biostatistics ● Study design ● Evaluation ● Informatics ● Clinical expertise ● Innovation & commercialization ● Community & partnerships N3C Domain Team Expertise: ● Enclave technology ● Data model (OMOP) ● Terminologies ● Data quality ● Codesets, variables, phenotype ● Using/parsing N3C data ● Workflows, methods, algorithms Roles Ingredients (Methods, datasets, instruments) Scientific questions N3C team Science within & across institutions https://covid.cd2h.org/domain-teams CTSAs
  • 59.
    OUTCOMES OF COVID-19IN CANCER PATIENTS: REPORT FROM THE NATIONAL COVID COHORT COLLABORATIVE (N3C) Noha Sharafeldin, Benjamin Bates, Qianqian Song, Vithal Madhira, Yao Yan, Sharlene Dong, Eileen Lee, Nathaniel Kuhrt, Yu Raymond Shao, Feifan Liu, Timothy Bergquist, Justin Guinney, Jing Su, Umit Topaloglu on behalf of the N3C Consortium Given on June 4, 2021 https://covid.cd2h.org/ cd2h.slack.com @data2health
  • 60.
    N3C Oncology DomainTeam (ODT) 60 Noha Sharafeldin, MBBCh, PhD Benjamin Bates, MD Rutgers University Umit Topaloglu, PhD Wake Forest University Noha Sharafeldin, MD, PhD The University of Alabama at Birmingham Leadership https://covid.cd2h.org/oncology Slack channel: #n3c-tt-oncology
  • 61.
    N3C ODT Expertise 61 NohaSharafeldin, MBBCh, PhD Noha Sharafeldin Informatics Biostatistics Clinical Epidemiology N3C data and Logic Umit Topaloglu Jing Su Benjamin Bates Justin Guinney Vithal Madhira Tim Bergquist Feifan Liu Qianqian Song Yu Raymond Shao Nate Kuhrt Sharlene Dong Eileen Lee Yao Yan
  • 62.
    N3C Oncology A programof NIH’s National Center for Advancing Translational Sciences http://ascopubs.org/doi/full/10.1200/JCO.21.01074
  • 63.
    N3C Cancer Cohort PrimaryDiagnosis 63 Noha Sharafeldin, MBBCh, PhD
  • 64.
    N3C Cancer Cohort 64 NohaSharafeldin, MBBCh, PhD Primary Outcome • All- cause mortality Secondary Outcomes (Clinical severity indicators requiring hospitalization) • Mechanical Ventilation
  • 65.
    65 Insert Name (Insert >Header & Footer > Apply to All) Demographic, clinical, and tumor characteristics 9 Noha Sharafeldin, MBBCh, PhD 2% 13% 31% 54% Age 18-29 30-49 50-64 65+ COVID-19 Positive 4% 13% 61% 22% Race Hispanic Non-Hispanic Black Non-Hispanic White Other or Unknown 51% 49% Sex Female Male 11% 34% 28% 5% 22% Geographical Location US-Northeast US-Midwest US-South US-West Unknown
  • 66.
    66 Insert Name (Insert >Header & Footer > Apply to All) Demographic, clinical, and tumor characteristics 10 Noha Sharafeldin, MBBCh, PhD COVID-19 Positive 86% 14% Smoking status Non-smoker Current or Former smoker 41% 16% 9% 6% 28% 0 2000 4000 6000 8000 10000 12000 14000 16000 18000 0 1 2 3 ≥4 ADJUSTED CCI
  • 67.
    67 Insert Name (Insert >Header & Footer > Apply to All) Demographic, clinical, and tumor characteristics 11 Noha Sharafeldin, MBBCh, PhD COVID-19 Positive 15% 14% 12% 12% 9% 11% 0 1000 2000 3000 4000 5000 6000 7000 SKIN CANCERS BREAST CANCER PROSTATE CANCER HEMATOLOGICAL CANCERS GASTROINTESTINAL CANCERS MULTI-SITE Type of primary malignancy 71% 12% 11% 3% 3% Solid Liquid Multi-Site Unknown Undefined Primary
  • 68.
    COVID-19 Treatment 68 Noha Sharafeldin,MBBCh, PhD COVID-19 Treatment (Yes) COVID positive (n=38,614) Systemic antibiotics Systemic steroids Azithromycin Remdesivir Dexamethasone Hydroxychloroquine (HCQ) 4032(15.75%) 3514(13.73%) 1197(4.68%) 1047(4.09%) 1029(4.02%) 364(1.42%)
  • 69.
    Death and invasiveventilation in hospitalized patients 69 Noha Sharafeldin, MBBCh, PhD Outcome COVID positive (n=19,515) COVID negative (n=184,988) Death Invasive Ventilation 2,894 (14.8%) 1,606 (8.2%) 23,207 (12.5%) 9,576 (5.2%)
  • 70.
    Survival Probability – byCOVID status 70 Noha Sharafeldin, MBBCh, PhD HR = 1.20 (95%CI: 1.15 – 1.24, p<0.001)
  • 71.
    Survival Probability by cancertype among COVID positive patients 71 Noha Sharafeldin, MBBCh, PhD
  • 72.
    Hazard ratios associatedwith 1-year all-cause mortality among COVID-positive patients 72 Noha Sharafeldin, MBBCh, PhD
  • 73.
    Hazard ratios associatedwith 1-year all-cause mortality among COVID-positive patients 73 Noha Sharafeldin, MBBCh, PhD
  • 74.
    Hazard ratios associatedwith 1-year all-cause mortality among COVID-positive patients 74 Noha Sharafeldin, MBBCh, PhD
  • 75.
    Hazard ratios associatedwith 1-year all-cause mortality among COVID-positive patients 75 Noha Sharafeldin, MBBCh, PhD
  • 76.
    76 Noha Sharafeldin, MBBCh,PhD Limitations • RWD Challenges (e.g. data missingness) • Limited capture of recent cancer therapy • Potential misclassification of cancer patients • Challenges in primary cancer diagnosis mapping and limited historical data • Method for construction of COVID-19 negative control
  • 77.
    77 Noha Sharafeldin, MBBCh,PhD Conclusions • N3C represents a unique resource to examine effects of COVID-19 on cancer outcomes • Largest COVID-19 and cancer cohort within the US • Consistent with previous literature, older age, male gender, increasing comorbidities, and hematological malignancies were associated with higher mortality in patients with cancer and COVID-19 • The N3C dataset confirmed that cancer patients with COVID-19 who received recent immuno-, or targeted therapies were not at higher risks of overall mortality
  • 78.
    78 Noha Sharafeldin, MBBCh,PhD Acknowledgements The Patients US Data Partners N3C Consortial Authors Christopher Chute Melissa Haendel Amit Mitra Ramakanth Kavuluru NCATS U24 TR002306 NIGMS 5U54GM104942-04 NCI P30CA012197 [UT, QS] LLS 3386-19 [NS] Indiana University Precision Health Initiative [JS] N3C Core Teams
  • 79.
    79 Noha Sharafeldin, MBBCh,PhD Acknowledgements We gratefully acknowledge contributions from the following N3C core teams: • Principal Investigators: Melissa A. Haendel*, Christopher G. Chute*, Kenneth R. Gersing, Anita Walden • Workstream, subgroup and administrative leaders: Melissa A. Haendel*, Tellen D. Bennett, Christopher G. Chute, David A. Eichmann, Justin Guinney, Warren A. Kibbe, Hongfang Liu, Philip R.O. Payne, Emily R. Pfaff, Peter N. Robinson, Joel H. Saltz, Heidi Spratt, Justin Starren, Christine Suver, Adam B. Wilcox, Andrew E. Williams, Chunlei Wu • Key liaisons at data partner sites • Regulatory staff at data partner sites • Individuals at the sites who are responsible for creating the datasets and submitting data to N3C • Data Ingest and Harmonization Team: Christopher G. Chute*, Emily R. Pfaff*, Davera Gabriel, Stephanie S. Hong, Kristin Kostka, Harold P. Lehmann, Richard A. Moffitt, Michele Morris, Matvey B. Palchuk, Xiaohan Tanner Zhang, Richard L. Zhu • Phenotype Team (Individuals who create the scripts that the sites use to submit their data, based on the COVID and Long COVID definitions): Emily R. Pfaff*, Benjamin Amor, Mark M. Bissell, Marshall Clark, Andrew T. Girvin, Stephanie S. Hong, Kristin Kostka, Adam M. Lee, Robert T. Miller, Michele Morris, Matvey B. Palchuk, Kellie M. Walters • Project Management and Operations Team: Anita Walden*, Yooree Chae, Connor Cook, Alexandra Dest, Racquel R. Dietz, Thomas Dillon, Patricia A. Francis, Rafael Fuentes, Alexis Graves, Julie A. McMurry, Andrew J. Neumann, Shawn T. O'Neil, Andréa M. Volz, Elizabeth Zampino • Partners from NIH and other federal agencies: Christopher P. Austin*, Kenneth R. Gersing*, Samuel Bozzette, Mariam Deacy, Nicole Garbarini, Michael G. Kurilla, Sam G. Michael, Joni L. Rutter, Meredith Temple-O'Connor • Analytics Team (Individuals who build the Enclave infrastructure, help create codesets, variables, and help Domain Teams and project teams with their datasets): Benjamin Amor*, Mark M. Bissell, Katie Rebecca Bradwell, Andrew T. Girvin, Amin Manna, Nabeel Qureshi • Publication Committee Management Team: Mary Morrison Saltz*, Christine Suver*, Christopher G. Chute, Melissa A. Haendel, Julie A. McMurry, Andréa M. Volz, Anita Walden • Publication Committee Review Team: Carolyn Bramante, Jeremy Richard Harper, Wenndy Hernandez, Farrukh M Koraishy, Federico Mariona, Saidulu Mattapally, Amit Saha, Satyanarayana Vedula
  • 80.
    N3C Registration/Training https://covid.cd2h.org/tutorials Training OfficeHours: Tuesdays & Thursdays at 10-11 am PT/1-2 pm ET Registration Required at this link Orientation Video Coming Soon Additional Training Tutorials available in the Enclave Registration for Documents, Meetings & the N3C Data Enclave Requires Authentication Enclave Checklist A program of NIH’s National Center for Advancing Translational Sciences
  • 81.
    ● N3C comprisesthe largest, most representative patient-level COVID-19 cohort in the US and continues to grow ● We CAN do transparent, reproducible, innovative science (including ML) on sensitive observational data at scale, together! ● N3C is an innovative partnership between clinical sites, CDM communities, NIH ICs, CD2H, and commercial partners ● Automation of data extraction and minimum requirements reduces burden and increases site participation ● Robust attribution of all contributors; also provides great venue for trainees ● N3C data is complicated, but there are many people and resources to help users do good science Step 4. Federated Analytics with HPC Takeaways A program of NIH’s National Center for Advancing Translational Sciences
  • 82.
    Register with N3C:https://labs.cd2h.org/registration/ Joining Workstreams: N3C Data Ingestion & Harmonization Workstream Slack Channel Harmonization Google Group Harmonization N3C Phenotype & Data Acquisition Workstream Slack Channel Phenotype Google Group Phenotype N3C Collaborative Analytics Workstream Slack Channel Analytics Google Group Analytics N3C Data Partnership & Governance Workstream Slack Channel Governance Google Group Governance N3C Synthetic Clinical Data Workstream Slack Channel Synthetic Google Group Synthetic N3C Implementation Workstream- Coming soon Additional Information: Onboarding N3C, Slack, Google | Finding and Joining a Google Group NCATS N3C Webpage N3C Website How to Get Involved with N3C A program of NIH’s National Center for Advancing Translational Sciences
  • 83.
    Melissa A. Haendel,1,4,7,8,10,13,14,52,78,101Christopher G. Chute,1,4,8,10,13,14,52,78,100,101 Tellen D. Bennett,9,10,13,14,52,100,101 David A. Eichmann,4,9,10,13,78,101 Justin Guinney,4,9,10,14,78,101 Warren A. Kibbe,9,10,52,78,101 Philip R.O. Payne,4,9,10,78,101 Emily R. Pfaff,9,10,13,15,52,78 Peter N. Robinson,4,9,10,15,52,78,100 Joel H. Saltz,10,13,14,15,52,78,101 Heidi Spratt,9,10,100 Christine Suver,10,78,101 John Wilbanks,10,78,101 Adam B. Wilcox,10,101 Andrew E. Williams,10,13,78 Chunlei Wu,9,13,14,78 Clair Blacketer,15,52 Robert L. Bradford,9,52 James J. Cimino,10,14,101 Marshall Clark,9,15,52 Evan W. Colmenares,9,15,52 Patricia A. Francis,78 Davera Gabriel,9,10,13,14,15,52 Alexis Graves,7,9,78 Raju Hemadri,9,15,52 Stephanie S. Hong,9,15,52 George Hripscak,10,52 Dazhi Jiao,9,15,52 Jeffrey G. Klann,14,52,101 Kristin Kostka,9,15,52 Adam M. Lee,9,15,52 Harold P. Lehmann,9,15,52 Lora Lingrey,9,15,52 Robert T. Miller,9,15,52 Michele Morris,9,15,52 Shawn N. Murphy,9,15,52 Karthik Natarajan,9,15,52 Matvey B. Palchuk,9,15,52 Usman Sheikh,9,78 Harold Solbrig,9,15,52 Shyam Visweswaran,10,15,52,101 Anita Walden,7,10,13,14,52,101 Kellie M. Walters,10,14,101 Griffin M. Weber,10,101 Xiaohan Tanner Zhang,9,15,52 Richard L. Zhu,9,15,52 Benjamin Amor,78 Andrew T. Girvin,15,78 Amin Manna,78 Nabeel Qureshi,15,78 Michael G. Kurilla,10,78 Sam G. Michael,10,78 Lili M. Portilla,101 Joni L. Rutter,1,101 Christopher P. Austin,101 Ken R. Gersing,78,101 Shaymaa Al-Shukri,4,15 Adil Alaoui,101 Ahmad Baghal,15 Pamela D. Banning,15,100 Edward M. Barbour,8,15 Michael J. Becich,15,52,101 Afshin Beheshti,14 Gordon R. Bernard,8,15 Sharmodeep Bhattacharyya,100 Mark M. Bissell,9,15 L. Ebony Boulware,14,100 Samuel Bozzette,100,101 Donald E. Brown,101 John B. Buse,14 Brian J. Bush,8,101 Tiffany J. Callahan,14,52 Thomas R. Campion,8,15 Elena Casiraghi,9,15 Ammar A. Chaudhry,13,14 Guanhua Chen,9 Anjun Chen,13 Gari D. Clifford,8,15 Megan P. Coffee,14,100 Tom Conlin,14 Connor Cook,7,78 Keith A. Crandall,9,14,101 Mariam Deacy,78 Racquel R. Dietz,78 Nicholas J. Dobbins,8,9 Peter L. Elkin,15,52,100 Peter J. Embi,52,101 Julio C. Facelli,8,15 Karamarie Fecho,13 Xue Feng,9 Randi E. Foraker,8,13,15 Tamas S. Gal,8,15 Linqiang Ge,14 George Golovko,15,101 Ramkiran Gouripeddi,14,15 Casey S. Greene,13,14 Sangeeta Gupta,52,101 Ashish Gupta,13,101 Janos G. Hajagos,9,15 David A. Hanauer,15,52 Jeremy Richard Harper,9,14,52 Nomi L. Harris,14 Paul A. Harris,101 Mehadi R. Hassan,9 Yongqun He,15,52,100 Elaine L. Hill,9,14 Maureen E. Hoatlin,14 Kristi L. Holmes,4,101 LaRon Hughes,14 Randeep S. Jawa,14 Guoqian Jiang,14 Xia Jing,7,14 Marcin P. Joachimiak,8,15 Steven G. Johnson,9,14,101 Rishikesan Kamaleswaran,9,15,78 Thomas George Kannampallil,15,101 Andrew S. Kanter,15,52 Ramakanth Kavuluru,9,13,14 Kamil Khanipov,8,14 Hadi Kharrazi,9,14 Dongkyu Kim,15,52 Boyd M. Knosp,8,15 Arunkumar Krishnan,9 Tahsin Kurc,9,15 Albert M. Lai,101 Christophe G. Lambert,52,101 Michael Larionov,14 Stephen B. Lee,1,14 Michael D. Lesh,9 Olivier Lichtarge,14 John Liu,9 Sijia Liu,8,9,101 Hongfang Liu,9,15 Johanna J. Loomba,1,15,78,101 Sandeep K. Mallipattu,9,14,15 Chaitanya K. Mamillapalli,14 Christopher E. Mason,15 Jomol P. Mathew,8,15,52 James C. McClay,101 Julie A. McMurry,1,4,7,9,13,14,78 Paras P. Mehta,14 Ofer Mendelevitch,9 Stephane Meystre,8,14,15 Richard A. Moffitt,9,13,15 Jason H. Moore,8,9 Hiroki Morizono,13,14,15,52 Christopher J. Mungall,15,52 Monica C. Munoz-Torres,7,10,78 Andrew J. Neumann,78 Xia Ning,14 Jennifer E. Nyland,13,14 Lisa O'Keefe,78 Anna O'Malley,78 Shawn T. O'Neil,78 Jihad S. Obeid,10,14,15 Elizabeth L. Ogburn,13 Jimmy Phuong,9,15,52,100,101 Jose D Posada,8,15 Prateek Prasanna,14,52 Fred Prior,9,14,15 Justin Prosser,9,78 Amanda Lienau Purnell,101 Ali Rahnavard,9,52 Harish Ramadas,9,52,78 Justin T. Reese,9,10 Jennifer L. Robinson,14,100 Daniel L. Rubin,101 Cody D. Rutherford,9,101 Eugene M. Sadhu,8,15 Amit Saha,9 Mary Morrison Saltz,15,52,101 Thomas Schaffter,78 Titus KL Schleyer,14 Soko Setoguchi,8,14,15 Nigam H. Shah,8,14 Noha Sharafeldin,14 Evan Sholle,15,52 Jonathan C. Silverstein,15,52,101 Anthony Solomonides,101 Julian Solway,14,101 Jing Su,101 Vignesh Subbian,9,52,101 Hyo Jung Tak,15 Bradley W. Taylor,9,14 Anne E. Thessen,14,101 Jason A. Thomas,15 Umit Topaloglu,15,52 Deepak R. Unni,8,9,15,52 Joshua T. Vogelstein,14 Andréa M. Volz,7 David A. Williams,14,15 Kelli M. Wilson,9,78 Clark B. Xu,8,9,15 Hua Xu,9,10,14 Yao Yan,9,15,52 Elizabeth Zak,8,15 Lanjing Zhang,101 Chengda Zhang,14 Jingyi Zheng,14 1CREDIT_00000001 (Conceptualization)4CREDIT_00000004 (Funding acquisition)7CRO_0000007 (Marketing and Communications)8CREDIT_00000008 (Resources)9CREDIT_00000009 (Software role)10CREDIT_00000010 (Supervision role)13CREDIT_00000013 (Original draft)14CREDIT_00000014 (Review and editing)15CRO_0000015 (Data role)52CRO_0000052 (Standards role)78CRO_0000078 (Infrastructure role)100Clinical Use Cases101Governance https://academic.oup.com/jamia/advance- article/doi/10.1093/jamia/ocaa196/5893482
  • 84.
  • 85.
    Thank you! Thank you! Aprogram of NIH’s National Center for Advancing Translational Sciences