SlideShare a Scribd company logo
1 of 37
Download to read offline
An Introduction to
CCDH
Joint meeting of the CRDC & the Center for Cancer Data Harmonization
Date: June 29, 2020
https://datascience.cancer.gov/data-commons/center-cancer-data-harmonization-ccdh
These slides: bit.ly/ccdh-crdc-june-2
These slides:
bit.ly/ccdh-crdc-june-2
Joint meeting of the CRDC & the Center for Cancer Data Harmonization
Date: June 29, 2020
https://datascience.cancer.gov/data-commons/center-cancer-data-harmonization-ccdh
Outline
● Synthesis of information from CRDC and insights
derived | Sam, Melissa
● Presentation of Harmonized Data Model | Brian & Matt
● Ontology landscape and terminological requirements |
Jim, Harold, Dazhi
Community
Development
(Lead: Volchenboum;
Co-Lead: Vasilevsky)
Data Model
harmonization
(Lead: Chute;
Co-Lead: Furner)
Ontology &
Terminology
Ecosystem
(Lead: Solbrig)
Tools & Data Quality
(Lead: Balhoff)
Programmatic oversight
CBIIT: Sherri De Coronado, Allen Dearry
FNL: Todd Pihl, Resham Kulkarni
Program Management and operations
(Lead: Haendel, Co-Lead: Munoz-Torres)
Role of CCDH in the CRDC ecosystem
Facilitate retrospective and prospective
semantic harmonization of data across
nodes of the CRDC
Coordinate the community to ensure quality
“fit for purpose” design and implementation
of standards that will facilitate
interoperability of heterogeneous data
types and CRDC resources
Find agreement across the communities
built around CRDC
- match and extend data models
- annotation, harmonization
- quality assurance
Data Model
harmonization
(Lead: Chute
Co-Lead: Furner)
Ontology &
Terminology
Ecosystem
(Lead: Solbrig)
Tools & Data
Quality
(Lead: Balhoff)
Schema to
schema
OMOP to
FHIR
Term to
Term
Oncotree to
NCIt
Data records to
data records
“Smoking status 3
packs per day” to
NCIT:C154510
[Heavy Smoker]
Synthesis of information from
CRDC and insights derived
Community Development Working Group
Goals:
● Engage CRDC stakeholders: interviews to identify and document semantic priorities
● Document current platforms
● Develop plans to support core semantic standards and concierge services
Completedinterviews
DCF: Data Commons Framework -
Infrastructure
Node
HTAN: Human Tumor Atlas Network
ICDC: Integrated Canine Data Commons
IDC: Imaging Data Commons
GDC: Genomic Data Commons
PDC: Proteomics Data Commons
Futureinterviews
Gabriella Miller Kids First Data Resource
Center
Node
CDS: Cancer Data Services
Broad Institute FireCloud
Institute for Systems Biology
SevenBridges
NBIA: National Biomedical Imaging
Archive
SEER Virtual Tissue Repository
CIDC: Cancer Immunology Data
Commons
Summary matrix from initial interviews
Community Development - Phase II - Pilot
● Provision of help desk services (office hours and GitHub issue tracker)
● Data preparation services
○ mapping and transformations of terminologies and models
○ metadata validation
○ data annotation
● Web portal development
● Work with the nodes to assist mapping and transformation of data
● Develop user support documentation and materials
Main user base is the node developers But these users will also benefit
Establish a
transparent
process for
community
discussion,
modification,
and acceptance
of new or
modified
content (GitHub)
Community Development - Phase III/IV - Production and Operations
Concierge
services for
CRDC nodes,
DCC, DCF,
other end users
Continue
collecting user
questions and
feedback to
improve
services and
identify user
needs and pain
points
Enable the
users to find the
resources they
need and to be
able to use the
portal
independently
Web portal
enhancements /
load testing
Unit tests / QC
CCDH Harmonized Data Model
● Will provide a single data model that harmonizes
syntax and semantics across the CRDC systems
and services.
● This CRDC-H model will enable data
aggregation and exchange to facilitate
integrated search, navigation, and
metadata-based analysis
● We will align with community standards where
possible (e.g. FHIR, BRIDG) to promote broader
interoperability, and leverage mappings and
tools provided by these efforts
Data Model Harmonization: Overview
Ecosystem of CRDC repositories,
services and stakeholders
1. Standardize Source Data
Model Documentation
2. Generate an Aggregated
Data Model (ADM)
3. Map the ADM to
Community Standard Data
Models
4. Refactor the ADM into a
Conceptual Domain
Model (CDM)
5. Refactor the CDM to a
Logical Data Model
(CRDC-H) An iterative process through which source model content is evaluated, aggregated,
mapped, and refactored into a standards- aligned and harmonized data model.
CRDC-H Model Development Workflow
Abstract specification
Low harmonization
Not standards-aligned
Concrete specification
Deep harmonization
Standards-aligned
● Targeted four source models
(GDC, PDC, ICDC, HTAN)
● Focused on Biospecimen
and Administrative
subdomains
● Harmonized entities and
attributes, not data types or
value sets/terminologies
● Informed by BRIDG and FHIR
standards
● Produced an exploratory
conceptual model (does not
yet support implementation) Lessons learned from this narrow but deep dive will inform subsequent
iterations that incorporate new data sources and subdomains.
Phase I: CDM Prototype Development
Abstract specification
Low harmonization
Not standards-aligned
Concrete specification
Deep harmonization
Standards-aligned
The Aggregated Data Model (ADM)
GDC
26 entities,
561 attributes
ADM
55 entities,
984 attributes
PDC
27 entities,
500 attributes
ICDC
27 entities,
265 attributes
The Aggregated Data Model (ADM)
A substrate
for
refactoring
into more
deeply
harmonized
models
Node models are not well aligned at the outset
● e.g. ICDC and GDC: ~30% entity equivalence , <5% attribute equivalence
Property aggregation in the ADM is based on superficial analysis and strict
aggregation criteria - so harmonization is minimal
● Only strictly equivalent elements within strictly equivalent entities are merged
Deeper aggregation and harmonization of elements will be achieved as the
ADM is refactored into the CDM
● Terminological - e.g. GDC 'Treatment' vs ICDC 'Agent Administration’
● Structural - e.g. ICDC provides a more normalized model for clinical metadata
● Semantic - e.g. harmonizing disease terminologies used across systems / species
● Precision - e.g. variable detail provided about tumor staging across models
High-Level
Structural
Changes
Resulting
from ADM
Refactoring
(Biospecimen
Subdomain)
The Conceptual Domain Model (CDM) Prototype
1. Specialization: Specialized specimen subtypes in the ADM get collapsed
2. Normalization: Data elements get distributed across a larger set of entities
3. Harmonization: Refactoring reduces total number of properties by half
ADM
refactoring
144 specimen
properties in
total
CDM
74 specimen
properties in
total
Refactoring results in a much more normalized and deeply harmonized CDM model
UML
Diagram of
CDM
Entities
and
Attributes
(link)
The Conceptual Domain Model (CDM)
Entities in the CDM
prototype, and the
attributes held by each
Attribute count shown in
parentheses.
CDM Data
Dictionary
(link)
● The CDM prototype is specified as a spreadsheet-based data dictionary
● Entities and their Attributes are each described in a separate sheet
● Cardinality of attributes is specified to be as permissive as possible initially
● Data Types are minimally specified
○ Simple: declared only at a high level (limited to literal, boolean)
○ Complex: proposals for Identifier, Coding, DateTime, Quantity, . . .
● A ‘Referenced Entities’ sheet lists entities that are referenced in CDM relationships,
but are not in scope to model in this phase of work.
○ e.g. Organization, Visit, ConditionDiagnosis
● A ‘Data Containers’ sheet holds placeholders for objects that will be defined to group
sets of related properties (specific structures for these t.b.d.)
● Mappings of several types are also provided in the main Entity sheets:
○ ADM attributes that map to each CDM attribute (column L)
○ Source node attributes aggregated by these ADM attributes (column M)
○ CDM to FHIR mappings (column N)
The Conceptual Domain Model (CDM)
The Conceptual Domain Model (CDM)
Excerpt from the ‘Specimen’ sheet of the CDM Data Dictionary (link)
Attribute Definitions Mappings
BRIDG bridgmodel.nci.nih.gov
● A detailed and highly-normalized conceptual model
covering the domains of clinical and translational research
(a mapping ‘hub’, not an implementation model)
● ADM mappings to BRIDG support a deeper
understanding of source model elements, keep our data
model grounded in reality, and enable cross-mapping to
other BRIDG-mapped models
FHIR hl7.org/fhir
● A data exchange model and API framework covering
patient-level healthcare information generated in EHRs
● ADM mappings to FHIR provide a pragmatic target to
guide ADM->CDM refactoring, as alignment can enable
interoperability with clinical data systems, and potentially
lets us leverage FHIR infrastructure and tools
Mapping CCDH Models to BRIDG and FHIR
Mappings from Sources and the CDM to BRIDG and FHIR can be derived from ADM mappings to each of these models
BiologicSpecimen <--beAFunctionPerformedBy-- Subject
<--beParticipatedInBy-- PerformedMaterialProcessStep.methodCode
WHERE PerformedMaterialProcessStep --instantiate→
DefinedMaterialProcessStep.nameCode="freeze"
BRIDG mapping path for ADM.Sample.freezing_method:
FHIR elements required to represent ADM Sample
● Test / validate the CDM
prototype against node data,
competency questions, and
feedback from stakeholders.
● Incorporate additional CRDC
source models into the ADM
(e.g. IDC) (Steps 1 and 2)
Phase II Activities: Multiple Workstreams in Parallel
● Refactor additional ADM subdomains into the CDM (e.g. clinical metadata) (Steps 3 and 4)
● Evolve mature CDM content into an implementable logical model (Step 5)
● Terminological / value set harmonization
Key CCDH Modeling Work Products
ID Name Description
Archived
Document
WP0 May 2020 Phase 1 Report Short document describing work performed and products generated in this phase of work. gdoc
WP1 BRIDG and FHIR Mappings A spreadsheet with detailed and provenanced mappings of ADM elements to BRIDG and FHIR xls
WP2 BRIDG and FHIR Covering Model Diagrams UML-like views of elements in the BRIDG and FHIR models required to represent ADM entities. pdf
WP3 CDM Entity and Attribute Diagram A class diagram providing a high-level view of the CDM pdf
WP5 CDM Dictionary (and Mappings)
A data dictionary spreadsheet detailing the Conceptual Domain Model, and its attribute-level mappings to the
ADM and FHIR.
gsheets
WP6 ADM Representation in FHIR A representation of ADM entities using FHIR metamodeling language and tooling gdoc
ID Name Description
Archived
Document
WP1 CRDC Node concept maps A side-by-side view of the core models implemented by GDC, PDC, and ICDC nodes. png
WP3 CRDC Data Model Dictionaries One document with separate spreadsheets for GDC, PDC, and ICDC models. gsheets
WP5 Aggregated Model Concept Map A high level view of the entities and relationships in the aggregated model. png
WP6 Aggregated Data Dictionary Spreadsheets describing all elements from the Aggregated Model, and mappings to source elements. gsheets
May 2020 Deliverable Package
February 2020 Deliverable Package
Ontology landscape and
requirements for
terminologies and tools
Delivering terminological & data model content to support data
ingest / data harmonization within each node
● Provide tools to facilitate use of the harmonized data model and terminology by
nodes
○ Harmonized data and terminologies enable access to data via CDA
● Metadata validation leveraging the harmonized terminology
● Mapping incoming datasets to the harmonized model
● Migration across harmonized model versions
● Leverage existing tools, existing terminologies, where possible
Behind every data model are the tools and terminologies that make it work
Terminology tools and services landscape assessment
What already exists? What can be best utilized or adapted for the CRDC? What are the gaps?
Admin/Access
Licensing
Registration
Authentication
Publication
Version management
Change management
Automated updates
UI/Browse/Search
Term search/Autocomplete
UI for navigation
Querying, filtering
Synonym support
Visualization
Community use indicated/tracked
API
Standard
Named entity recognition
Validation
Transitive closure
Identifiers
URIs
Dereferencing
Mapping
Serves maps
Map curation and authoring
Map validation
Value set services
Formats
Semantic typing
Inputs, outputs, OWL2, etc.
Data annotation and QC tools
What already exists? What can be best utilized or adapted for the CRDC? What are the gaps?
Mapping and Transformation
standardization
NLP/named entity recognition
semantic similarity
Metadata Validation and QC
value sets
logical constraints
syntax
Data Annotation
template building
term search
terminology browsing
Examples
CEDAR
Ptolemy.V
Metadata Validation Service
Simple Terminology Server
FHIR Terminology Server
OpenRefine
RDF shapes (ShEx/SHACL)
ISO 11179 Metadata Registries (MDR)
● Provenance / history
● Contacts / managing organization
● Semantics - what the elements in a data model represent
ISO 11179-3 - registry metamodel and basic attributes carry a standard model of
“binding” -- how one associates ontology meaning with both the data element itself
and its content.
Standard for recording information about data models
ISO 11179 Model of meaning / Model of representation
Current roles in caDSR + NCI Thesaurus
RDF as the great “blender”
ADM Models
Represented
using FHIR
Metamodel,
and generated
documentation
https://fhir.hotecosystem.org/ccdh/fhir/, https://fhir.hotecosystem.org/ccdh/fhir/aliquot.html
FHIR as a Modeling Framework
FHIR into the RDF blender
RDF blender to FHIR
Putting it all together
Model in Google Sheets
Putting it all together -- do we need a unifying representation?
Model in Google Sheets
Acknowledgments
Center for Biomedical Informatics &
Information Technology
● Allen Dearry
● Sherri de Coronado
● Erika Kim
● Denise Warzel
● Melissa Cook
Samvit Solutions
● Smita Hastak
● Wendy Ver Hoef
● Charles Yaghmour
● Todd Pihl
● Resham Kulkarni
Frederick National Laboratory
for Cancer Research
DCF: Data Commons Framework - Infrastructure
HTAN: Human Tumor Atlas Network
ICDC: Integrated Canine Data Commons
IDC: Imaging Data Commons
GDC: Genomic Data Commons
PDC: Proteomics Data Commons
SevenBridges
Gabriella Miller Kids First Data Resource Center
CDS: Cancer Data Services
Broad Institute FireCloud
Institute for Systems Biology
NBIA: National Biomedical Imaging Archive
SEER Virtual Tissue Repository
CIDC: Cancer Immunology Data Commons
Cancer Data
Aggregator
● Brian O’Connor
● Alex Baumann
● David Pot
● Jack DiGiovanna
● Cara Mason

More Related Content

What's hot

Practical Parallel Hypergraph Algorithms | PPoPP ’20
Practical Parallel Hypergraph Algorithms | PPoPP ’20Practical Parallel Hypergraph Algorithms | PPoPP ’20
Practical Parallel Hypergraph Algorithms | PPoPP ’20Subhajit Sahu
 
Modeling Search Computing Applications
Modeling Search Computing ApplicationsModeling Search Computing Applications
Modeling Search Computing ApplicationsMarco Brambilla
 
IRJET- Review of Existing Methods in K-Means Clustering Algorithm
IRJET- Review of Existing Methods in K-Means Clustering AlgorithmIRJET- Review of Existing Methods in K-Means Clustering Algorithm
IRJET- Review of Existing Methods in K-Means Clustering AlgorithmIRJET Journal
 
Business process management
Business process managementBusiness process management
Business process managementSpringer
 
Paper id 25201463
Paper id 25201463Paper id 25201463
Paper id 25201463IJRAT
 
STIC-D: algorithmic techniques for efficient parallel pagerank computation on...
STIC-D: algorithmic techniques for efficient parallel pagerank computation on...STIC-D: algorithmic techniques for efficient parallel pagerank computation on...
STIC-D: algorithmic techniques for efficient parallel pagerank computation on...Subhajit Sahu
 
Evaluation of graph databases
Evaluation of graph databasesEvaluation of graph databases
Evaluation of graph databasesijaia
 
Big dataintegration rahm-part3Scalable and privacy-preserving data integratio...
Big dataintegration rahm-part3Scalable and privacy-preserving data integratio...Big dataintegration rahm-part3Scalable and privacy-preserving data integratio...
Big dataintegration rahm-part3Scalable and privacy-preserving data integratio...ErhardRahm
 
SURVEY ON SCHEDULING AND ALLOCATION IN HIGH LEVEL SYNTHESIS
SURVEY ON SCHEDULING AND ALLOCATION IN HIGH LEVEL SYNTHESISSURVEY ON SCHEDULING AND ALLOCATION IN HIGH LEVEL SYNTHESIS
SURVEY ON SCHEDULING AND ALLOCATION IN HIGH LEVEL SYNTHESIScscpconf
 
data Fusion and log correlation
data Fusion and log correlationdata Fusion and log correlation
data Fusion and log correlationMahdi Sayyad
 
Crowdsourcing tasks in Linked Data management
Crowdsourcing tasks in Linked Data managementCrowdsourcing tasks in Linked Data management
Crowdsourcing tasks in Linked Data managementBarry Norton
 
A Framework for Linked Data Quality based on Data Profiling and RDF Shape Ind...
A Framework for Linked Data Quality based on Data Profiling and RDF Shape Ind...A Framework for Linked Data Quality based on Data Profiling and RDF Shape Ind...
A Framework for Linked Data Quality based on Data Profiling and RDF Shape Ind...Nandana Mihindukulasooriya
 
TOPOLOGY AWARE LOAD BALANCING FOR GRIDS
TOPOLOGY AWARE LOAD BALANCING FOR GRIDS TOPOLOGY AWARE LOAD BALANCING FOR GRIDS
TOPOLOGY AWARE LOAD BALANCING FOR GRIDS ijgca
 
ModelDR - the tool that untangles complex information
ModelDR - the tool that untangles complex informationModelDR - the tool that untangles complex information
ModelDR - the tool that untangles complex informationSimon Roberts
 
A SEMANTIC BASED APPROACH FOR KNOWLEDGE DISCOVERY AND ACQUISITION FROM MULTIP...
A SEMANTIC BASED APPROACH FOR KNOWLEDGE DISCOVERY AND ACQUISITION FROM MULTIP...A SEMANTIC BASED APPROACH FOR KNOWLEDGE DISCOVERY AND ACQUISITION FROM MULTIP...
A SEMANTIC BASED APPROACH FOR KNOWLEDGE DISCOVERY AND ACQUISITION FROM MULTIP...csandit
 
Jarrar: Architectural solutions in Data Integration
Jarrar: Architectural solutions in Data IntegrationJarrar: Architectural solutions in Data Integration
Jarrar: Architectural solutions in Data IntegrationMustafa Jarrar
 

What's hot (17)

Practical Parallel Hypergraph Algorithms | PPoPP ’20
Practical Parallel Hypergraph Algorithms | PPoPP ’20Practical Parallel Hypergraph Algorithms | PPoPP ’20
Practical Parallel Hypergraph Algorithms | PPoPP ’20
 
Modeling Search Computing Applications
Modeling Search Computing ApplicationsModeling Search Computing Applications
Modeling Search Computing Applications
 
IRJET- Review of Existing Methods in K-Means Clustering Algorithm
IRJET- Review of Existing Methods in K-Means Clustering AlgorithmIRJET- Review of Existing Methods in K-Means Clustering Algorithm
IRJET- Review of Existing Methods in K-Means Clustering Algorithm
 
Business process management
Business process managementBusiness process management
Business process management
 
Paper id 25201463
Paper id 25201463Paper id 25201463
Paper id 25201463
 
STIC-D: algorithmic techniques for efficient parallel pagerank computation on...
STIC-D: algorithmic techniques for efficient parallel pagerank computation on...STIC-D: algorithmic techniques for efficient parallel pagerank computation on...
STIC-D: algorithmic techniques for efficient parallel pagerank computation on...
 
Evaluation of graph databases
Evaluation of graph databasesEvaluation of graph databases
Evaluation of graph databases
 
Big dataintegration rahm-part3Scalable and privacy-preserving data integratio...
Big dataintegration rahm-part3Scalable and privacy-preserving data integratio...Big dataintegration rahm-part3Scalable and privacy-preserving data integratio...
Big dataintegration rahm-part3Scalable and privacy-preserving data integratio...
 
SURVEY ON SCHEDULING AND ALLOCATION IN HIGH LEVEL SYNTHESIS
SURVEY ON SCHEDULING AND ALLOCATION IN HIGH LEVEL SYNTHESISSURVEY ON SCHEDULING AND ALLOCATION IN HIGH LEVEL SYNTHESIS
SURVEY ON SCHEDULING AND ALLOCATION IN HIGH LEVEL SYNTHESIS
 
data Fusion and log correlation
data Fusion and log correlationdata Fusion and log correlation
data Fusion and log correlation
 
Crowdsourcing tasks in Linked Data management
Crowdsourcing tasks in Linked Data managementCrowdsourcing tasks in Linked Data management
Crowdsourcing tasks in Linked Data management
 
A Framework for Linked Data Quality based on Data Profiling and RDF Shape Ind...
A Framework for Linked Data Quality based on Data Profiling and RDF Shape Ind...A Framework for Linked Data Quality based on Data Profiling and RDF Shape Ind...
A Framework for Linked Data Quality based on Data Profiling and RDF Shape Ind...
 
TOPOLOGY AWARE LOAD BALANCING FOR GRIDS
TOPOLOGY AWARE LOAD BALANCING FOR GRIDS TOPOLOGY AWARE LOAD BALANCING FOR GRIDS
TOPOLOGY AWARE LOAD BALANCING FOR GRIDS
 
Metadata Mapping & Crosswalks
Metadata Mapping & CrosswalksMetadata Mapping & Crosswalks
Metadata Mapping & Crosswalks
 
ModelDR - the tool that untangles complex information
ModelDR - the tool that untangles complex informationModelDR - the tool that untangles complex information
ModelDR - the tool that untangles complex information
 
A SEMANTIC BASED APPROACH FOR KNOWLEDGE DISCOVERY AND ACQUISITION FROM MULTIP...
A SEMANTIC BASED APPROACH FOR KNOWLEDGE DISCOVERY AND ACQUISITION FROM MULTIP...A SEMANTIC BASED APPROACH FOR KNOWLEDGE DISCOVERY AND ACQUISITION FROM MULTIP...
A SEMANTIC BASED APPROACH FOR KNOWLEDGE DISCOVERY AND ACQUISITION FROM MULTIP...
 
Jarrar: Architectural solutions in Data Integration
Jarrar: Architectural solutions in Data IntegrationJarrar: Architectural solutions in Data Integration
Jarrar: Architectural solutions in Data Integration
 

Similar to Introduction to CCDH Joint Meeting Recap

Government GraphSummit: And Then There Were 15 Standards
Government GraphSummit: And Then There Were 15 StandardsGovernment GraphSummit: And Then There Were 15 Standards
Government GraphSummit: And Then There Were 15 StandardsNeo4j
 
Standards for clinical research data - steps to an information model (CRIM).
Standards for clinical research data - steps to an information model (CRIM).Standards for clinical research data - steps to an information model (CRIM).
Standards for clinical research data - steps to an information model (CRIM).Wolfgang Kuchinke
 
Linking HPC to Data Management - EUDAT Summer School (Giuseppe Fiameni, CINECA)
Linking HPC to Data Management - EUDAT Summer School (Giuseppe Fiameni, CINECA)Linking HPC to Data Management - EUDAT Summer School (Giuseppe Fiameni, CINECA)
Linking HPC to Data Management - EUDAT Summer School (Giuseppe Fiameni, CINECA)EUDAT
 
Open Services for Lifecycle Collaboration (OSLC)
Open Services for Lifecycle Collaboration (OSLC) Open Services for Lifecycle Collaboration (OSLC)
Open Services for Lifecycle Collaboration (OSLC) Axel Reichwein
 
The NIH Data Commons - BD2K All Hands Meeting 2015
The NIH Data Commons -  BD2K All Hands Meeting 2015The NIH Data Commons -  BD2K All Hands Meeting 2015
The NIH Data Commons - BD2K All Hands Meeting 2015Vivien Bonazzi
 
Bonazzi commons bd2 k ahm 2016 v2
Bonazzi commons bd2 k ahm 2016 v2Bonazzi commons bd2 k ahm 2016 v2
Bonazzi commons bd2 k ahm 2016 v2Vivien Bonazzi
 
Svcc services presentation (Silicon Valley code camp 2011)
Svcc services presentation (Silicon Valley code camp 2011)Svcc services presentation (Silicon Valley code camp 2011)
Svcc services presentation (Silicon Valley code camp 2011)Jen Wong
 
NCI Cancer Research Data Commons - Overview
NCI Cancer Research Data Commons - OverviewNCI Cancer Research Data Commons - Overview
NCI Cancer Research Data Commons - Overviewimgcommcall
 
A metadata standard for Knowledge Graphs
A metadata standard for Knowledge GraphsA metadata standard for Knowledge Graphs
A metadata standard for Knowledge GraphsMichel Dumontier
 
Towards Generating Policy-compliant Datasets (poster)
Towards GeneratingPolicy-compliant Datasets (poster)Towards GeneratingPolicy-compliant Datasets (poster)
Towards Generating Policy-compliant Datasets (poster)Christophe Debruyne
 
CoDe Modeling of Graph Composition for Data Warehouse Report Visualization
CoDe Modeling of Graph Composition for Data Warehouse Report VisualizationCoDe Modeling of Graph Composition for Data Warehouse Report Visualization
CoDe Modeling of Graph Composition for Data Warehouse Report VisualizationKaashivInfoTech Company
 
A_Logical_Design_Methodology_for_Relational_Databa.pdf
A_Logical_Design_Methodology_for_Relational_Databa.pdfA_Logical_Design_Methodology_for_Relational_Databa.pdf
A_Logical_Design_Methodology_for_Relational_Databa.pdfXANDERHERNANDEZ5
 
Optimizing Bigdata Processing by using Hybrid Hierarchically Distributed Data...
Optimizing Bigdata Processing by using Hybrid Hierarchically Distributed Data...Optimizing Bigdata Processing by using Hybrid Hierarchically Distributed Data...
Optimizing Bigdata Processing by using Hybrid Hierarchically Distributed Data...IJCSIS Research Publications
 
1-SDLC - Development Models – Waterfall, Rapid Application Development, Agile...
1-SDLC - Development Models – Waterfall, Rapid Application Development, Agile...1-SDLC - Development Models – Waterfall, Rapid Application Development, Agile...
1-SDLC - Development Models – Waterfall, Rapid Application Development, Agile...JOHNLEAK1
 
The Commons: Leveraging the Power of the Cloud for Big Data
The Commons: Leveraging the Power of the Cloud for Big DataThe Commons: Leveraging the Power of the Cloud for Big Data
The Commons: Leveraging the Power of the Cloud for Big DataPhilip Bourne
 
deep_Visualization in Data mining.ppt
deep_Visualization in Data mining.pptdeep_Visualization in Data mining.ppt
deep_Visualization in Data mining.pptPerumalPitchandi
 
2 data warehouse life cycle golfarelli
2 data warehouse life cycle golfarelli2 data warehouse life cycle golfarelli
2 data warehouse life cycle golfarellitruongthuthuy47
 

Similar to Introduction to CCDH Joint Meeting Recap (20)

Government GraphSummit: And Then There Were 15 Standards
Government GraphSummit: And Then There Were 15 StandardsGovernment GraphSummit: And Then There Were 15 Standards
Government GraphSummit: And Then There Were 15 Standards
 
Standards for clinical research data - steps to an information model (CRIM).
Standards for clinical research data - steps to an information model (CRIM).Standards for clinical research data - steps to an information model (CRIM).
Standards for clinical research data - steps to an information model (CRIM).
 
Linking HPC to Data Management - EUDAT Summer School (Giuseppe Fiameni, CINECA)
Linking HPC to Data Management - EUDAT Summer School (Giuseppe Fiameni, CINECA)Linking HPC to Data Management - EUDAT Summer School (Giuseppe Fiameni, CINECA)
Linking HPC to Data Management - EUDAT Summer School (Giuseppe Fiameni, CINECA)
 
Open Services for Lifecycle Collaboration (OSLC)
Open Services for Lifecycle Collaboration (OSLC) Open Services for Lifecycle Collaboration (OSLC)
Open Services for Lifecycle Collaboration (OSLC)
 
Webinar@AIMS: LODE-BD
Webinar@AIMS: LODE-BDWebinar@AIMS: LODE-BD
Webinar@AIMS: LODE-BD
 
Bcbs 239 v4 30 oct
Bcbs 239 v4 30 octBcbs 239 v4 30 oct
Bcbs 239 v4 30 oct
 
The NIH Data Commons - BD2K All Hands Meeting 2015
The NIH Data Commons -  BD2K All Hands Meeting 2015The NIH Data Commons -  BD2K All Hands Meeting 2015
The NIH Data Commons - BD2K All Hands Meeting 2015
 
Bonazzi commons bd2 k ahm 2016 v2
Bonazzi commons bd2 k ahm 2016 v2Bonazzi commons bd2 k ahm 2016 v2
Bonazzi commons bd2 k ahm 2016 v2
 
Svcc services presentation (Silicon Valley code camp 2011)
Svcc services presentation (Silicon Valley code camp 2011)Svcc services presentation (Silicon Valley code camp 2011)
Svcc services presentation (Silicon Valley code camp 2011)
 
NCI Cancer Research Data Commons - Overview
NCI Cancer Research Data Commons - OverviewNCI Cancer Research Data Commons - Overview
NCI Cancer Research Data Commons - Overview
 
A metadata standard for Knowledge Graphs
A metadata standard for Knowledge GraphsA metadata standard for Knowledge Graphs
A metadata standard for Knowledge Graphs
 
Towards Generating Policy-compliant Datasets (poster)
Towards GeneratingPolicy-compliant Datasets (poster)Towards GeneratingPolicy-compliant Datasets (poster)
Towards Generating Policy-compliant Datasets (poster)
 
CoDe Modeling of Graph Composition for Data Warehouse Report Visualization
CoDe Modeling of Graph Composition for Data Warehouse Report VisualizationCoDe Modeling of Graph Composition for Data Warehouse Report Visualization
CoDe Modeling of Graph Composition for Data Warehouse Report Visualization
 
A_Logical_Design_Methodology_for_Relational_Databa.pdf
A_Logical_Design_Methodology_for_Relational_Databa.pdfA_Logical_Design_Methodology_for_Relational_Databa.pdf
A_Logical_Design_Methodology_for_Relational_Databa.pdf
 
Optimizing Bigdata Processing by using Hybrid Hierarchically Distributed Data...
Optimizing Bigdata Processing by using Hybrid Hierarchically Distributed Data...Optimizing Bigdata Processing by using Hybrid Hierarchically Distributed Data...
Optimizing Bigdata Processing by using Hybrid Hierarchically Distributed Data...
 
Data mining
Data miningData mining
Data mining
 
1-SDLC - Development Models – Waterfall, Rapid Application Development, Agile...
1-SDLC - Development Models – Waterfall, Rapid Application Development, Agile...1-SDLC - Development Models – Waterfall, Rapid Application Development, Agile...
1-SDLC - Development Models – Waterfall, Rapid Application Development, Agile...
 
The Commons: Leveraging the Power of the Cloud for Big Data
The Commons: Leveraging the Power of the Cloud for Big DataThe Commons: Leveraging the Power of the Cloud for Big Data
The Commons: Leveraging the Power of the Cloud for Big Data
 
deep_Visualization in Data mining.ppt
deep_Visualization in Data mining.pptdeep_Visualization in Data mining.ppt
deep_Visualization in Data mining.ppt
 
2 data warehouse life cycle golfarelli
2 data warehouse life cycle golfarelli2 data warehouse life cycle golfarelli
2 data warehouse life cycle golfarelli
 

More from Nicole Vasilevsky

Teaching Data Science to Undergraduate Students
Teaching Data Science to Undergraduate StudentsTeaching Data Science to Undergraduate Students
Teaching Data Science to Undergraduate StudentsNicole Vasilevsky
 
Improving Knowledge Discovery Through Development of Big Data to Knowledge S...
Improving Knowledge Discovery Through Development of  Big Data to Knowledge S...Improving Knowledge Discovery Through Development of  Big Data to Knowledge S...
Improving Knowledge Discovery Through Development of Big Data to Knowledge S...Nicole Vasilevsky
 
Empowering patients by increasing accessibility to clinical terminology
Empowering patients by increasing accessibility to clinical terminologyEmpowering patients by increasing accessibility to clinical terminology
Empowering patients by increasing accessibility to clinical terminologyNicole Vasilevsky
 
Data science education resources for everyone
Data science education resources for everyoneData science education resources for everyone
Data science education resources for everyoneNicole Vasilevsky
 
Enhancing the Human Phenotype Ontology for Use by the Layperson
Enhancing the Human Phenotype Ontology for Use by the LaypersonEnhancing the Human Phenotype Ontology for Use by the Layperson
Enhancing the Human Phenotype Ontology for Use by the LaypersonNicole Vasilevsky
 
Enhancing the Human Phenotype Ontology for Use by the Layperson
Enhancing the Human Phenotype Ontology for Use by the LaypersonEnhancing the Human Phenotype Ontology for Use by the Layperson
Enhancing the Human Phenotype Ontology for Use by the LaypersonNicole Vasilevsky
 
Couture Curricula - BD2K Data Science Tailored to Your Needs
Couture Curricula - BD2K Data Science Tailored to Your NeedsCouture Curricula - BD2K Data Science Tailored to Your Needs
Couture Curricula - BD2K Data Science Tailored to Your NeedsNicole Vasilevsky
 
Monarch Initiative Poster - Rare Disease Symposium 2015
Monarch Initiative Poster - Rare Disease Symposium 2015Monarch Initiative Poster - Rare Disease Symposium 2015
Monarch Initiative Poster - Rare Disease Symposium 2015Nicole Vasilevsky
 
The Role of Libraries in Data Management and Curation
The Role of Libraries in Data Management and CurationThe Role of Libraries in Data Management and Curation
The Role of Libraries in Data Management and CurationNicole Vasilevsky
 
Resource Identification Initiative_RDA_March2014
Resource Identification Initiative_RDA_March2014 Resource Identification Initiative_RDA_March2014
Resource Identification Initiative_RDA_March2014 Nicole Vasilevsky
 
On the Reproducibility of Science: Unique Identification of Research Resourc...
On the Reproducibility of Science: Unique Identification of  Research Resourc...On the Reproducibility of Science: Unique Identification of  Research Resourc...
On the Reproducibility of Science: Unique Identification of Research Resourc...Nicole Vasilevsky
 
Research resources: curating the new eagle-i discovery system
Research resources: curating the new eagle-i discovery systemResearch resources: curating the new eagle-i discovery system
Research resources: curating the new eagle-i discovery systemNicole Vasilevsky
 

More from Nicole Vasilevsky (13)

Teaching Data Science to Undergraduate Students
Teaching Data Science to Undergraduate StudentsTeaching Data Science to Undergraduate Students
Teaching Data Science to Undergraduate Students
 
Improving Knowledge Discovery Through Development of Big Data to Knowledge S...
Improving Knowledge Discovery Through Development of  Big Data to Knowledge S...Improving Knowledge Discovery Through Development of  Big Data to Knowledge S...
Improving Knowledge Discovery Through Development of Big Data to Knowledge S...
 
Empowering patients by increasing accessibility to clinical terminology
Empowering patients by increasing accessibility to clinical terminologyEmpowering patients by increasing accessibility to clinical terminology
Empowering patients by increasing accessibility to clinical terminology
 
Data science education resources for everyone
Data science education resources for everyoneData science education resources for everyone
Data science education resources for everyone
 
Enhancing the Human Phenotype Ontology for Use by the Layperson
Enhancing the Human Phenotype Ontology for Use by the LaypersonEnhancing the Human Phenotype Ontology for Use by the Layperson
Enhancing the Human Phenotype Ontology for Use by the Layperson
 
Enhancing the Human Phenotype Ontology for Use by the Layperson
Enhancing the Human Phenotype Ontology for Use by the LaypersonEnhancing the Human Phenotype Ontology for Use by the Layperson
Enhancing the Human Phenotype Ontology for Use by the Layperson
 
Couture Curricula - BD2K Data Science Tailored to Your Needs
Couture Curricula - BD2K Data Science Tailored to Your NeedsCouture Curricula - BD2K Data Science Tailored to Your Needs
Couture Curricula - BD2K Data Science Tailored to Your Needs
 
Monarch Initiative Poster - Rare Disease Symposium 2015
Monarch Initiative Poster - Rare Disease Symposium 2015Monarch Initiative Poster - Rare Disease Symposium 2015
Monarch Initiative Poster - Rare Disease Symposium 2015
 
Acrl march2015 final
Acrl march2015 finalAcrl march2015 final
Acrl march2015 final
 
The Role of Libraries in Data Management and Curation
The Role of Libraries in Data Management and CurationThe Role of Libraries in Data Management and Curation
The Role of Libraries in Data Management and Curation
 
Resource Identification Initiative_RDA_March2014
Resource Identification Initiative_RDA_March2014 Resource Identification Initiative_RDA_March2014
Resource Identification Initiative_RDA_March2014
 
On the Reproducibility of Science: Unique Identification of Research Resourc...
On the Reproducibility of Science: Unique Identification of  Research Resourc...On the Reproducibility of Science: Unique Identification of  Research Resourc...
On the Reproducibility of Science: Unique Identification of Research Resourc...
 
Research resources: curating the new eagle-i discovery system
Research resources: curating the new eagle-i discovery systemResearch resources: curating the new eagle-i discovery system
Research resources: curating the new eagle-i discovery system
 

Recently uploaded

Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxpriyankatabhane
 
Sulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptx
Sulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptxSulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptx
Sulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptxnoordubaliya2003
 
User Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationUser Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationColumbia Weather Systems
 
Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPirithiRaju
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Nistarini College, Purulia (W.B) India
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPirithiRaju
 
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPirithiRaju
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentationtahreemzahra82
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naJASISJULIANOELYNV
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)riyaescorts54
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxBerniceCayabyab1
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensorsonawaneprad
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxpriyankatabhane
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationColumbia Weather Systems
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRlizamodels9
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Patrick Diehl
 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayupadhyaymani499
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA
 

Recently uploaded (20)

Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptx
 
Sulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptx
Sulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptxSulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptx
Sulphur & Phosphrus Cycle PowerPoint Presentation (2) [Autosaved]-3-1.pptx
 
User Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather StationUser Guide: Magellan MX™ Weather Station
User Guide: Magellan MX™ Weather Station
 
Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdf
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...
 
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdfPests of Blackgram, greengram, cowpea_Dr.UPR.pdf
Pests of Blackgram, greengram, cowpea_Dr.UPR.pdf
 
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentation
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by na
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
 
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensor
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
 
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort ServiceHot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
 
User Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather StationUser Guide: Capricorn FLX™ Weather Station
User Guide: Capricorn FLX™ Weather Station
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?
 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyay
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
 

Introduction to CCDH Joint Meeting Recap

  • 1. An Introduction to CCDH Joint meeting of the CRDC & the Center for Cancer Data Harmonization Date: June 29, 2020 https://datascience.cancer.gov/data-commons/center-cancer-data-harmonization-ccdh These slides: bit.ly/ccdh-crdc-june-2
  • 2. These slides: bit.ly/ccdh-crdc-june-2 Joint meeting of the CRDC & the Center for Cancer Data Harmonization Date: June 29, 2020 https://datascience.cancer.gov/data-commons/center-cancer-data-harmonization-ccdh
  • 3. Outline ● Synthesis of information from CRDC and insights derived | Sam, Melissa ● Presentation of Harmonized Data Model | Brian & Matt ● Ontology landscape and terminological requirements | Jim, Harold, Dazhi
  • 4. Community Development (Lead: Volchenboum; Co-Lead: Vasilevsky) Data Model harmonization (Lead: Chute; Co-Lead: Furner) Ontology & Terminology Ecosystem (Lead: Solbrig) Tools & Data Quality (Lead: Balhoff) Programmatic oversight CBIIT: Sherri De Coronado, Allen Dearry FNL: Todd Pihl, Resham Kulkarni Program Management and operations (Lead: Haendel, Co-Lead: Munoz-Torres)
  • 5. Role of CCDH in the CRDC ecosystem Facilitate retrospective and prospective semantic harmonization of data across nodes of the CRDC Coordinate the community to ensure quality “fit for purpose” design and implementation of standards that will facilitate interoperability of heterogeneous data types and CRDC resources Find agreement across the communities built around CRDC - match and extend data models - annotation, harmonization - quality assurance
  • 6. Data Model harmonization (Lead: Chute Co-Lead: Furner) Ontology & Terminology Ecosystem (Lead: Solbrig) Tools & Data Quality (Lead: Balhoff) Schema to schema OMOP to FHIR Term to Term Oncotree to NCIt Data records to data records “Smoking status 3 packs per day” to NCIT:C154510 [Heavy Smoker]
  • 7. Synthesis of information from CRDC and insights derived
  • 8. Community Development Working Group Goals: ● Engage CRDC stakeholders: interviews to identify and document semantic priorities ● Document current platforms ● Develop plans to support core semantic standards and concierge services Completedinterviews DCF: Data Commons Framework - Infrastructure Node HTAN: Human Tumor Atlas Network ICDC: Integrated Canine Data Commons IDC: Imaging Data Commons GDC: Genomic Data Commons PDC: Proteomics Data Commons Futureinterviews Gabriella Miller Kids First Data Resource Center Node CDS: Cancer Data Services Broad Institute FireCloud Institute for Systems Biology SevenBridges NBIA: National Biomedical Imaging Archive SEER Virtual Tissue Repository CIDC: Cancer Immunology Data Commons Summary matrix from initial interviews
  • 9. Community Development - Phase II - Pilot ● Provision of help desk services (office hours and GitHub issue tracker) ● Data preparation services ○ mapping and transformations of terminologies and models ○ metadata validation ○ data annotation ● Web portal development ● Work with the nodes to assist mapping and transformation of data ● Develop user support documentation and materials Main user base is the node developers But these users will also benefit
  • 10. Establish a transparent process for community discussion, modification, and acceptance of new or modified content (GitHub) Community Development - Phase III/IV - Production and Operations Concierge services for CRDC nodes, DCC, DCF, other end users Continue collecting user questions and feedback to improve services and identify user needs and pain points Enable the users to find the resources they need and to be able to use the portal independently Web portal enhancements / load testing Unit tests / QC
  • 12. ● Will provide a single data model that harmonizes syntax and semantics across the CRDC systems and services. ● This CRDC-H model will enable data aggregation and exchange to facilitate integrated search, navigation, and metadata-based analysis ● We will align with community standards where possible (e.g. FHIR, BRIDG) to promote broader interoperability, and leverage mappings and tools provided by these efforts Data Model Harmonization: Overview Ecosystem of CRDC repositories, services and stakeholders
  • 13. 1. Standardize Source Data Model Documentation 2. Generate an Aggregated Data Model (ADM) 3. Map the ADM to Community Standard Data Models 4. Refactor the ADM into a Conceptual Domain Model (CDM) 5. Refactor the CDM to a Logical Data Model (CRDC-H) An iterative process through which source model content is evaluated, aggregated, mapped, and refactored into a standards- aligned and harmonized data model. CRDC-H Model Development Workflow Abstract specification Low harmonization Not standards-aligned Concrete specification Deep harmonization Standards-aligned
  • 14. ● Targeted four source models (GDC, PDC, ICDC, HTAN) ● Focused on Biospecimen and Administrative subdomains ● Harmonized entities and attributes, not data types or value sets/terminologies ● Informed by BRIDG and FHIR standards ● Produced an exploratory conceptual model (does not yet support implementation) Lessons learned from this narrow but deep dive will inform subsequent iterations that incorporate new data sources and subdomains. Phase I: CDM Prototype Development Abstract specification Low harmonization Not standards-aligned Concrete specification Deep harmonization Standards-aligned
  • 15. The Aggregated Data Model (ADM) GDC 26 entities, 561 attributes ADM 55 entities, 984 attributes PDC 27 entities, 500 attributes ICDC 27 entities, 265 attributes
  • 16. The Aggregated Data Model (ADM) A substrate for refactoring into more deeply harmonized models Node models are not well aligned at the outset ● e.g. ICDC and GDC: ~30% entity equivalence , <5% attribute equivalence Property aggregation in the ADM is based on superficial analysis and strict aggregation criteria - so harmonization is minimal ● Only strictly equivalent elements within strictly equivalent entities are merged Deeper aggregation and harmonization of elements will be achieved as the ADM is refactored into the CDM ● Terminological - e.g. GDC 'Treatment' vs ICDC 'Agent Administration’ ● Structural - e.g. ICDC provides a more normalized model for clinical metadata ● Semantic - e.g. harmonizing disease terminologies used across systems / species ● Precision - e.g. variable detail provided about tumor staging across models
  • 17. High-Level Structural Changes Resulting from ADM Refactoring (Biospecimen Subdomain) The Conceptual Domain Model (CDM) Prototype 1. Specialization: Specialized specimen subtypes in the ADM get collapsed 2. Normalization: Data elements get distributed across a larger set of entities 3. Harmonization: Refactoring reduces total number of properties by half ADM refactoring 144 specimen properties in total CDM 74 specimen properties in total Refactoring results in a much more normalized and deeply harmonized CDM model
  • 18. UML Diagram of CDM Entities and Attributes (link) The Conceptual Domain Model (CDM) Entities in the CDM prototype, and the attributes held by each Attribute count shown in parentheses.
  • 19. CDM Data Dictionary (link) ● The CDM prototype is specified as a spreadsheet-based data dictionary ● Entities and their Attributes are each described in a separate sheet ● Cardinality of attributes is specified to be as permissive as possible initially ● Data Types are minimally specified ○ Simple: declared only at a high level (limited to literal, boolean) ○ Complex: proposals for Identifier, Coding, DateTime, Quantity, . . . ● A ‘Referenced Entities’ sheet lists entities that are referenced in CDM relationships, but are not in scope to model in this phase of work. ○ e.g. Organization, Visit, ConditionDiagnosis ● A ‘Data Containers’ sheet holds placeholders for objects that will be defined to group sets of related properties (specific structures for these t.b.d.) ● Mappings of several types are also provided in the main Entity sheets: ○ ADM attributes that map to each CDM attribute (column L) ○ Source node attributes aggregated by these ADM attributes (column M) ○ CDM to FHIR mappings (column N) The Conceptual Domain Model (CDM)
  • 20. The Conceptual Domain Model (CDM) Excerpt from the ‘Specimen’ sheet of the CDM Data Dictionary (link) Attribute Definitions Mappings
  • 21. BRIDG bridgmodel.nci.nih.gov ● A detailed and highly-normalized conceptual model covering the domains of clinical and translational research (a mapping ‘hub’, not an implementation model) ● ADM mappings to BRIDG support a deeper understanding of source model elements, keep our data model grounded in reality, and enable cross-mapping to other BRIDG-mapped models FHIR hl7.org/fhir ● A data exchange model and API framework covering patient-level healthcare information generated in EHRs ● ADM mappings to FHIR provide a pragmatic target to guide ADM->CDM refactoring, as alignment can enable interoperability with clinical data systems, and potentially lets us leverage FHIR infrastructure and tools Mapping CCDH Models to BRIDG and FHIR Mappings from Sources and the CDM to BRIDG and FHIR can be derived from ADM mappings to each of these models BiologicSpecimen <--beAFunctionPerformedBy-- Subject <--beParticipatedInBy-- PerformedMaterialProcessStep.methodCode WHERE PerformedMaterialProcessStep --instantiate→ DefinedMaterialProcessStep.nameCode="freeze" BRIDG mapping path for ADM.Sample.freezing_method: FHIR elements required to represent ADM Sample
  • 22. ● Test / validate the CDM prototype against node data, competency questions, and feedback from stakeholders. ● Incorporate additional CRDC source models into the ADM (e.g. IDC) (Steps 1 and 2) Phase II Activities: Multiple Workstreams in Parallel ● Refactor additional ADM subdomains into the CDM (e.g. clinical metadata) (Steps 3 and 4) ● Evolve mature CDM content into an implementable logical model (Step 5) ● Terminological / value set harmonization
  • 23. Key CCDH Modeling Work Products ID Name Description Archived Document WP0 May 2020 Phase 1 Report Short document describing work performed and products generated in this phase of work. gdoc WP1 BRIDG and FHIR Mappings A spreadsheet with detailed and provenanced mappings of ADM elements to BRIDG and FHIR xls WP2 BRIDG and FHIR Covering Model Diagrams UML-like views of elements in the BRIDG and FHIR models required to represent ADM entities. pdf WP3 CDM Entity and Attribute Diagram A class diagram providing a high-level view of the CDM pdf WP5 CDM Dictionary (and Mappings) A data dictionary spreadsheet detailing the Conceptual Domain Model, and its attribute-level mappings to the ADM and FHIR. gsheets WP6 ADM Representation in FHIR A representation of ADM entities using FHIR metamodeling language and tooling gdoc ID Name Description Archived Document WP1 CRDC Node concept maps A side-by-side view of the core models implemented by GDC, PDC, and ICDC nodes. png WP3 CRDC Data Model Dictionaries One document with separate spreadsheets for GDC, PDC, and ICDC models. gsheets WP5 Aggregated Model Concept Map A high level view of the entities and relationships in the aggregated model. png WP6 Aggregated Data Dictionary Spreadsheets describing all elements from the Aggregated Model, and mappings to source elements. gsheets May 2020 Deliverable Package February 2020 Deliverable Package
  • 24. Ontology landscape and requirements for terminologies and tools
  • 25. Delivering terminological & data model content to support data ingest / data harmonization within each node ● Provide tools to facilitate use of the harmonized data model and terminology by nodes ○ Harmonized data and terminologies enable access to data via CDA ● Metadata validation leveraging the harmonized terminology ● Mapping incoming datasets to the harmonized model ● Migration across harmonized model versions ● Leverage existing tools, existing terminologies, where possible Behind every data model are the tools and terminologies that make it work
  • 26. Terminology tools and services landscape assessment What already exists? What can be best utilized or adapted for the CRDC? What are the gaps? Admin/Access Licensing Registration Authentication Publication Version management Change management Automated updates UI/Browse/Search Term search/Autocomplete UI for navigation Querying, filtering Synonym support Visualization Community use indicated/tracked API Standard Named entity recognition Validation Transitive closure Identifiers URIs Dereferencing Mapping Serves maps Map curation and authoring Map validation Value set services Formats Semantic typing Inputs, outputs, OWL2, etc.
  • 27. Data annotation and QC tools What already exists? What can be best utilized or adapted for the CRDC? What are the gaps? Mapping and Transformation standardization NLP/named entity recognition semantic similarity Metadata Validation and QC value sets logical constraints syntax Data Annotation template building term search terminology browsing Examples CEDAR Ptolemy.V Metadata Validation Service Simple Terminology Server FHIR Terminology Server OpenRefine RDF shapes (ShEx/SHACL)
  • 28. ISO 11179 Metadata Registries (MDR) ● Provenance / history ● Contacts / managing organization ● Semantics - what the elements in a data model represent ISO 11179-3 - registry metamodel and basic attributes carry a standard model of “binding” -- how one associates ontology meaning with both the data element itself and its content. Standard for recording information about data models
  • 29. ISO 11179 Model of meaning / Model of representation
  • 30. Current roles in caDSR + NCI Thesaurus
  • 31. RDF as the great “blender”
  • 32. ADM Models Represented using FHIR Metamodel, and generated documentation https://fhir.hotecosystem.org/ccdh/fhir/, https://fhir.hotecosystem.org/ccdh/fhir/aliquot.html FHIR as a Modeling Framework
  • 33. FHIR into the RDF blender
  • 35. Putting it all together Model in Google Sheets
  • 36. Putting it all together -- do we need a unifying representation? Model in Google Sheets
  • 37. Acknowledgments Center for Biomedical Informatics & Information Technology ● Allen Dearry ● Sherri de Coronado ● Erika Kim ● Denise Warzel ● Melissa Cook Samvit Solutions ● Smita Hastak ● Wendy Ver Hoef ● Charles Yaghmour ● Todd Pihl ● Resham Kulkarni Frederick National Laboratory for Cancer Research DCF: Data Commons Framework - Infrastructure HTAN: Human Tumor Atlas Network ICDC: Integrated Canine Data Commons IDC: Imaging Data Commons GDC: Genomic Data Commons PDC: Proteomics Data Commons SevenBridges Gabriella Miller Kids First Data Resource Center CDS: Cancer Data Services Broad Institute FireCloud Institute for Systems Biology NBIA: National Biomedical Imaging Archive SEER Virtual Tissue Repository CIDC: Cancer Immunology Data Commons Cancer Data Aggregator ● Brian O’Connor ● Alex Baumann ● David Pot ● Jack DiGiovanna ● Cara Mason