SlideShare a Scribd company logo
1 of 38
Healthdata.gov Metadata:
Lifting Schemes and Controlled
Vocabularies
Mark Musen
Natasha Noy
National Center for Biomedical Ontology
Stanford Center for Biomedical Informatics Research
Stanford University
National Center for Biomedical
Ontology
• We create and maintain a library of
biomedical ontologies and terminologies.
• We build tools and Web services to enable
the use of ontologies and terminologies.
• We collaborate with scientific communities
that develop and use ontologies and
terminologies in biomedicine.
2
Controlled Vocabularies in Healthcare
Healthcare Common Procedure Coding
System (HCPCS) Current Procedural Terminology (CPT)
International Classification of Diseases (ICD)
RxNorm
The National Drug File - Reference
Terminology (NDF-RT)
National Center for Biomedical
Ontology
• Provides key technology for
– Describing medical terminologies Accessing
information about terms in medical terminologies
– Using mappings across terminologies
– Annotating content with terms from medical
terminologies
BioPortal stores Big Data about Big Data
Mappings Between Terminologies
• Available through a REST API and SPARQL endpoint
• Example: Term mappings from HCPCS to CPT
source term
Mapped to CPT:
lexically and
through UMLS CUI
Mapping
metadata
Protégé: Editing Ontologies and
Terminologies
• An open-source ontology editor
• Has more than 200,000 registered users
• Works in a web browser
• Supports collaboration
• Built on Semantic Web standards
• Has dozens of plugins developed by our
user community
Health Data Platform
Metadata Challenge
Each Dataset Is
(Largely) a Silo
The Anatomy of a Dataset
Basic metadata
about the dataset
published by HHS,
describes Part B
National Summary
Data File, covers the
period from Jan 1,
2000 to Dec 31, 2000,
...
Metadata about
the structure of
the dataset
allowed
services, charges, and
payments
for each HCPCS code
and modifier
Content of
the dataset
charge of $4,966 for
radiology procedure
Focus of the
Metadata
Challenge
The NCBO Response to the
Healthdata.gov Metadata Challenge
• Protégé to create dataset descriptions
• Custom scripts to extract metadata
• BioPortal to provide terminologies
• SPARQL endpoint to access the resulting
knowledge graph
Metadata about
the structure of
the dataset
Anchoring values
in controlled
terminologies
Basic metadata
about the dataset
NCBO Solution: Outline
Metadata about
the structure of
the dataset
Anchoring values
in controlled
terminologies
http://healthdata.bioontology.org
Bring Healthdata.gov Datasets
into the Linked Open Data Cloud
Step 1: Basic Metadata
About the Dataset
Links to Additional Metadata and
Vocabularies
Linking Healthdata Datasets
to Other Metadata
Find all reports authored by
the Department of Health and
Human Services
and its agencies
annually
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dc: <http://purl.org/dc/terms/>
PREFIX dbpedia: <http://dbpedia.org/resource/>
PREFIX org: <http://www.w3.org/ns/org#>
SELECT DISTINCT ?title WHERE {
?ds dc:accrualPeriodicity ?per .
?per rdfs:label "Annual" .
?ds dc:creator ?ag .
?ag org:subOrganzationOf
dbpedia:United_States_Department_of_Health_and_Human_Services .
?ds dc:title ?title
} GROUP BY ?title
Ask it in SPARQL
Find all reports authored by
the Department of Health and Human Services
and its agencies annually
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dc: <http://purl.org/dc/terms/>
PREFIX dbpedia: <http://dbpedia.org/resource/>
PREFIX org: <http://www.w3.org/ns/org#>
SELECT DISTINCT ?title WHERE {
?ds dc:accrualPeriodicity ?per .
?per rdfs:label "Annual" .
?ds dc:creator ?ag .
?ag org:subOrganzationOf
dbpedia:United_States_Department_of_Health_and_Human_Services .
?ds dc:title ?title
} GROUP BY ?title
Basic metadata
about the dataset
Anchoring values
in controlled
terminologies
NCBO Solution: Outline
Metadata about
the structure of
the dataset
http://healthdata.bioontology.org
Basic metadata
about the dataset
Anchoring values
in controlled
terminologies
There Is No Free Lunch
• Modeling metadata enables us to understand
how the content is related
• Necessary if we want to integrate data
Dataset Descriptions Are Buried in Text
Files
Part B National
Summary Data File Map
HCPCS code and
HCPCS modifier
to the amount charged for
allowed services,
allowed charges, and
payment
RDF Data Cube
Vocabulary
Modeling Dimensions and
Measures of the data
PartB National
Summary data
rdf:type:
qb:Dataset, dcat:Dataset
DSD
rdf:type:
qb:DataStructureDefinition
qb:structure
HCPCS/CPT Payments
rdf:type:
qb:ComponentSpecfication
qb:component
HCPCS or CPT Code
rdf:type:
qb:CodedProperty,
qb:DimensionProperty
qb:dimension
HCPCS or CPT
modifier
rdf:type:
qb:CodedProperty,
qb:DimensionProperty
qb:dimension
Number of services
rdf:type:
qb:MeasureProperty
qb:measure Allowed charges
rdf:type:
qb:MeasurePropertyqb:measure
Payment for services
rdf:type:
qb:MeasureProperty
qb:measure
HCPCS codes
rdf:type:
skos:ConceptScheme
HCPCS modifiers
rdf:type:
skos:ConceptScheme
qb:codeList
qb:codeList
CPT codes
rdf:type:
skos:ConceptScheme
qb:codeList
CPT modifiers
rdf:type:
skos:ConceptScheme
qb:codeList
PartB National
Summary data
rdf:type:
qb:Dataset, dcat:Dataset
DSD
rdf:type:
qb:DataStructureDefinition
qb:structure
HCPCS/CPT Payments
rdf:type:
qb:ComponentSpecfication
qb:component
HCPCS or CPT Code
rdf:type:
qb:CodedProperty,
qb:DimensionProperty
qb:dimension
HCPCS or CPT
modifier
rdf:type:
qb:CodedProperty,
qb:DimensionProperty
qb:dimension
Number of services
rdf:type:
qb:MeasureProperty
qb:measure Allowed charges
rdf:type:
qb:MeasurePropertyqb:measure
Payment for services
rdf:type:
qb:MeasureProperty
qb:measure
HCPCS codes
rdf:type:
skos:ConceptScheme
HCPCS modifiers
rdf:type:
skos:ConceptScheme
qb:codeList
qb:codeList
CPT codes
rdf:type:
skos:ConceptScheme
qb:codeList
CPT modifiers
rdf:type:
skos:ConceptScheme
qb:codeList
Structured
dataset
definition:
~20 triples
Dataset content:
~10,000−1,000,000
rows
PREFIX dcat: <http://www.w3.org/ns/dcat#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX qb: <http://purl.org/linked-data/cube#>
PREFIX bp: <http://purl.bioontology.org/ontology/>
PREFIX dsd: <http://purl.bioontology.org/healthdata/dsd/>
SELECT DISTINCT ?title WHERE {
?dataset a dcat:Dataset .
?dataset qb:structure ?dsd .
?dsd qb:component ?cmp .
?cmp qb:dimension ?dim .
?dataset rdfs:label ?title .
?dim qb:codeList bp:CPT .
?cmp qb:measure dsd:number-of-services .
}
Find datasets that map HCPCS and CPT codes
to number of services for each code
PREFIX dcat: <http://www.w3.org/ns/dcat#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX qb: <http://purl.org/linked-data/cube#>
PREFIX bp: <http://purl.bioontology.org/ontology/>
PREFIX dsd: <http://purl.bioontology.org/healthdata/dsd/>
SELECT DISTINCT ?title WHERE {
?dataset a dcat:Dataset .
?dataset qb:structure ?dsd .
?dsd qb:component ?cmp .
?cmp qb:dimension ?dim .
?dataset rdfs:label ?title .
?dim qb:codeList bp:CPT .
?cmp qb:measure dsd:number-of-services .
}
Basic metadata
about the dataset
Metadata about
the structure of
the dataset
NCBO Solution: Outline
Anchoring values
in controlled
terminologies
Basic metadata
about the dataset
Metadata about
the structure of
the dataset
http://healthdata.bioontology.org
Controlled Vocabularies in Healthcare
Healthcare Common Procedure Coding
System (HCPCS) Current Procedural Terminology (CPT)
International Classification of Diseases (ICD)
RxNorm
The National Drug File - Reference
Terminology (NDF-RT)
NCBO BioPortal
• Uniform access to 330 public ontologies
and terminologies in biomedicine
• Web interface
• REST API
• Search across all ontologies
• Resolvable URIs for each term
• Mappings between terms in different
ontologies
From Codes to Ontology Terms
BioPortal as a Terminology Service
• Resolvable Uniform Resource Identifiers (URIs)
• Mappings to other terminologies
– Including metadata
– Including multiple mappings from a variety of sources
• Regular terminology updates from primary
sources
• Ability to define and upload new value sets
– We defined several value sets for the Metadata
Challenge
– Linked them to other terminologies in BioPortal
http://purl.bioontology.org/ontology/CPT/70010
NCBO Solution: Outline
Metadata about
the structure of
the dataset
Basic metadata
about the dataset
Anchoring in
controlled
terminologies
http://healthdata.bioontology.org
Recipe for Linking
Government Data
• Describe the information about each dataset
– Specify provenance
– Use standard representation schemes
– Use consensus vocabularies
• Describe the metadata about the content of the dataset
– Involve domain experts who understand the structure of the
data
– Use consensus vocabularies (e.g., W3C RDF Cube)
• Anchor the values in controlled vocabularies and datasets
– Use existing terminologies
– Define and publish value sets if none exist
– Map the value sets to standard terminologies
What we have learned from the
challenge
• Our entry demonstrated feasibility of
modeling metadata and linking them to
standard vocabularies
– Uses top–down approach
– Enables “deep” integration
– Extracts knowledge from data
• We need tools and scripts that are specific to
the task of dataset description
NCBO Solution: Outline
Metadata about
the structure of
the dataset
Basic metadata
about the dataset
Anchoring in
controlled
terminologies
http://healthdata.bioontology.org

More Related Content

What's hot

USUGM 2014 - Zhengwei Peng (Merck): In-depth analysis of patent molecular spa...
USUGM 2014 - Zhengwei Peng (Merck): In-depth analysis of patent molecular spa...USUGM 2014 - Zhengwei Peng (Merck): In-depth analysis of patent molecular spa...
USUGM 2014 - Zhengwei Peng (Merck): In-depth analysis of patent molecular spa...ChemAxon
 
2009 0807 Lod Gmod
2009 0807 Lod Gmod2009 0807 Lod Gmod
2009 0807 Lod GmodJun Zhao
 
Nucl. Acids Res.-2014-Howe-nar-gku1244
Nucl. Acids Res.-2014-Howe-nar-gku1244Nucl. Acids Res.-2014-Howe-nar-gku1244
Nucl. Acids Res.-2014-Howe-nar-gku1244Yasel Cruz
 
II-SDV 2016 Srinivasan Parthiban - KOL Analytics from Biomedical Literature
II-SDV 2016 Srinivasan Parthiban - KOL Analytics from Biomedical LiteratureII-SDV 2016 Srinivasan Parthiban - KOL Analytics from Biomedical Literature
II-SDV 2016 Srinivasan Parthiban - KOL Analytics from Biomedical LiteratureDr. Haxel Consult
 
ONTOFORCE Talk at PharmaTec London 2019 on the Data Surrealism in times of FA...
ONTOFORCE Talk at PharmaTec London 2019 on the Data Surrealism in times of FA...ONTOFORCE Talk at PharmaTec London 2019 on the Data Surrealism in times of FA...
ONTOFORCE Talk at PharmaTec London 2019 on the Data Surrealism in times of FA...Hans Constandt
 
HARE: An Engine for Enhancing Answer Completeness of SPARQL Queries via Crowd...
HARE: An Engine for Enhancing Answer Completeness of SPARQL Queries via Crowd...HARE: An Engine for Enhancing Answer Completeness of SPARQL Queries via Crowd...
HARE: An Engine for Enhancing Answer Completeness of SPARQL Queries via Crowd...Maribel Acosta Deibe
 
Cedar OnDemand: An intelligent browser extension to generate ontology-based m...
Cedar OnDemand: An intelligent browser extension to generate ontology-based m...Cedar OnDemand: An intelligent browser extension to generate ontology-based m...
Cedar OnDemand: An intelligent browser extension to generate ontology-based m...Syed Ahmad Chan Bukhari, PhD
 
ICIC 2017: Freeware and public databases: Towards a Wiki Drug Discovery?
ICIC 2017: Freeware and public databases: Towards a Wiki Drug Discovery?ICIC 2017: Freeware and public databases: Towards a Wiki Drug Discovery?
ICIC 2017: Freeware and public databases: Towards a Wiki Drug Discovery?Dr. Haxel Consult
 
MongoDB and the Connectivity Map: Making Connections Between Genetics and Dis...
MongoDB and the Connectivity Map: Making Connections Between Genetics and Dis...MongoDB and the Connectivity Map: Making Connections Between Genetics and Dis...
MongoDB and the Connectivity Map: Making Connections Between Genetics and Dis...MongoDB
 
FAIR History and the Future
FAIR History and the FutureFAIR History and the Future
FAIR History and the FutureCarole Goble
 
Standardization and integration of molecular biology information with DAS
Standardization and integration of molecular biology information with DASStandardization and integration of molecular biology information with DAS
Standardization and integration of molecular biology information with DASRafael C. Jimenez
 
Drug Repurposing using Deep Learning on Knowledge Graphs
Drug Repurposing using Deep Learning on Knowledge GraphsDrug Repurposing using Deep Learning on Knowledge Graphs
Drug Repurposing using Deep Learning on Knowledge GraphsDatabricks
 
Standard Datasets in Information Retrieval
Standard Datasets in Information Retrieval Standard Datasets in Information Retrieval
Standard Datasets in Information Retrieval Jean Brenda
 
Globus Genomics: Democratizing NGS Analysis
Globus Genomics: Democratizing NGS AnalysisGlobus Genomics: Democratizing NGS Analysis
Globus Genomics: Democratizing NGS AnalysisRavi Madduri
 
Prototype Crop Wild Relatives Portal, at the IMC Meeting (2007)
Prototype Crop Wild Relatives Portal, at the IMC Meeting (2007)Prototype Crop Wild Relatives Portal, at the IMC Meeting (2007)
Prototype Crop Wild Relatives Portal, at the IMC Meeting (2007)Dag Endresen
 
Semantic enrichment and similarity approximation for biomedical sequence images
Semantic enrichment and similarity approximation for biomedical sequence imagesSemantic enrichment and similarity approximation for biomedical sequence images
Semantic enrichment and similarity approximation for biomedical sequence imagesSyed Ahmad Chan Bukhari, PhD
 

What's hot (20)

USUGM 2014 - Zhengwei Peng (Merck): In-depth analysis of patent molecular spa...
USUGM 2014 - Zhengwei Peng (Merck): In-depth analysis of patent molecular spa...USUGM 2014 - Zhengwei Peng (Merck): In-depth analysis of patent molecular spa...
USUGM 2014 - Zhengwei Peng (Merck): In-depth analysis of patent molecular spa...
 
2009 0807 Lod Gmod
2009 0807 Lod Gmod2009 0807 Lod Gmod
2009 0807 Lod Gmod
 
Nucl. Acids Res.-2014-Howe-nar-gku1244
Nucl. Acids Res.-2014-Howe-nar-gku1244Nucl. Acids Res.-2014-Howe-nar-gku1244
Nucl. Acids Res.-2014-Howe-nar-gku1244
 
II-SDV 2016 Srinivasan Parthiban - KOL Analytics from Biomedical Literature
II-SDV 2016 Srinivasan Parthiban - KOL Analytics from Biomedical LiteratureII-SDV 2016 Srinivasan Parthiban - KOL Analytics from Biomedical Literature
II-SDV 2016 Srinivasan Parthiban - KOL Analytics from Biomedical Literature
 
ONTOFORCE Talk at PharmaTec London 2019 on the Data Surrealism in times of FA...
ONTOFORCE Talk at PharmaTec London 2019 on the Data Surrealism in times of FA...ONTOFORCE Talk at PharmaTec London 2019 on the Data Surrealism in times of FA...
ONTOFORCE Talk at PharmaTec London 2019 on the Data Surrealism in times of FA...
 
HARE: An Engine for Enhancing Answer Completeness of SPARQL Queries via Crowd...
HARE: An Engine for Enhancing Answer Completeness of SPARQL Queries via Crowd...HARE: An Engine for Enhancing Answer Completeness of SPARQL Queries via Crowd...
HARE: An Engine for Enhancing Answer Completeness of SPARQL Queries via Crowd...
 
Cedar OnDemand: An intelligent browser extension to generate ontology-based m...
Cedar OnDemand: An intelligent browser extension to generate ontology-based m...Cedar OnDemand: An intelligent browser extension to generate ontology-based m...
Cedar OnDemand: An intelligent browser extension to generate ontology-based m...
 
ICIC 2017: Freeware and public databases: Towards a Wiki Drug Discovery?
ICIC 2017: Freeware and public databases: Towards a Wiki Drug Discovery?ICIC 2017: Freeware and public databases: Towards a Wiki Drug Discovery?
ICIC 2017: Freeware and public databases: Towards a Wiki Drug Discovery?
 
MongoDB and the Connectivity Map: Making Connections Between Genetics and Dis...
MongoDB and the Connectivity Map: Making Connections Between Genetics and Dis...MongoDB and the Connectivity Map: Making Connections Between Genetics and Dis...
MongoDB and the Connectivity Map: Making Connections Between Genetics and Dis...
 
FAIR History and the Future
FAIR History and the FutureFAIR History and the Future
FAIR History and the Future
 
Standardization and integration of molecular biology information with DAS
Standardization and integration of molecular biology information with DASStandardization and integration of molecular biology information with DAS
Standardization and integration of molecular biology information with DAS
 
Consensus ranking and fragmentation prediction for identification of unknowns...
Consensus ranking and fragmentation prediction for identification of unknowns...Consensus ranking and fragmentation prediction for identification of unknowns...
Consensus ranking and fragmentation prediction for identification of unknowns...
 
How a Structure-Centric Community for Chemists Can Benefit Drug Discovery - V...
How a Structure-Centric Community for Chemists Can Benefit Drug Discovery - V...How a Structure-Centric Community for Chemists Can Benefit Drug Discovery - V...
How a Structure-Centric Community for Chemists Can Benefit Drug Discovery - V...
 
Drug Repurposing using Deep Learning on Knowledge Graphs
Drug Repurposing using Deep Learning on Knowledge GraphsDrug Repurposing using Deep Learning on Knowledge Graphs
Drug Repurposing using Deep Learning on Knowledge Graphs
 
Standard Datasets in Information Retrieval
Standard Datasets in Information Retrieval Standard Datasets in Information Retrieval
Standard Datasets in Information Retrieval
 
Globus Genomics: Democratizing NGS Analysis
Globus Genomics: Democratizing NGS AnalysisGlobus Genomics: Democratizing NGS Analysis
Globus Genomics: Democratizing NGS Analysis
 
US-EPA Chemicals Dashboard – an integrated data hub for environmental science
US-EPA Chemicals Dashboard – an integrated data hub for environmental scienceUS-EPA Chemicals Dashboard – an integrated data hub for environmental science
US-EPA Chemicals Dashboard – an integrated data hub for environmental science
 
Prototype Crop Wild Relatives Portal, at the IMC Meeting (2007)
Prototype Crop Wild Relatives Portal, at the IMC Meeting (2007)Prototype Crop Wild Relatives Portal, at the IMC Meeting (2007)
Prototype Crop Wild Relatives Portal, at the IMC Meeting (2007)
 
Semantic enrichment and similarity approximation for biomedical sequence images
Semantic enrichment and similarity approximation for biomedical sequence imagesSemantic enrichment and similarity approximation for biomedical sequence images
Semantic enrichment and similarity approximation for biomedical sequence images
 
The UK National Chemical Database Service – an integration of commercial and ...
The UK National Chemical Database Service – an integration of commercial and ...The UK National Chemical Database Service – an integration of commercial and ...
The UK National Chemical Database Service – an integration of commercial and ...
 

Viewers also liked

Health Datapalooza 2013: Datalab - Jim Craver
Health Datapalooza 2013: Datalab - Jim CraverHealth Datapalooza 2013: Datalab - Jim Craver
Health Datapalooza 2013: Datalab - Jim CraverHealth Data Consortium
 
Health Datapalooza: Bootcamp Idea Template Rowdmap Example
Health Datapalooza: Bootcamp Idea Template Rowdmap ExampleHealth Datapalooza: Bootcamp Idea Template Rowdmap Example
Health Datapalooza: Bootcamp Idea Template Rowdmap ExampleHealth Data Consortium
 
Health Datapalooza 2013: PCORI Challenge Annoucement
Health Datapalooza 2013: PCORI Challenge AnnoucementHealth Datapalooza 2013: PCORI Challenge Annoucement
Health Datapalooza 2013: PCORI Challenge AnnoucementHealth Data Consortium
 
Health Datapalooza 2013: State of the Art - CLIPMERGE
Health Datapalooza 2013: State of the Art - CLIPMERGEHealth Datapalooza 2013: State of the Art - CLIPMERGE
Health Datapalooza 2013: State of the Art - CLIPMERGEHealth Data Consortium
 
Health Datapalooza 2013: Data Design Diabetes Demo Day Intro
Health Datapalooza 2013: Data Design Diabetes Demo Day IntroHealth Datapalooza 2013: Data Design Diabetes Demo Day Intro
Health Datapalooza 2013: Data Design Diabetes Demo Day IntroHealth Data Consortium
 
Health Datapalooza 2013: Delos Cosgrove
Health Datapalooza 2013: Delos CosgroveHealth Datapalooza 2013: Delos Cosgrove
Health Datapalooza 2013: Delos CosgroveHealth Data Consortium
 

Viewers also liked (6)

Health Datapalooza 2013: Datalab - Jim Craver
Health Datapalooza 2013: Datalab - Jim CraverHealth Datapalooza 2013: Datalab - Jim Craver
Health Datapalooza 2013: Datalab - Jim Craver
 
Health Datapalooza: Bootcamp Idea Template Rowdmap Example
Health Datapalooza: Bootcamp Idea Template Rowdmap ExampleHealth Datapalooza: Bootcamp Idea Template Rowdmap Example
Health Datapalooza: Bootcamp Idea Template Rowdmap Example
 
Health Datapalooza 2013: PCORI Challenge Annoucement
Health Datapalooza 2013: PCORI Challenge AnnoucementHealth Datapalooza 2013: PCORI Challenge Annoucement
Health Datapalooza 2013: PCORI Challenge Annoucement
 
Health Datapalooza 2013: State of the Art - CLIPMERGE
Health Datapalooza 2013: State of the Art - CLIPMERGEHealth Datapalooza 2013: State of the Art - CLIPMERGE
Health Datapalooza 2013: State of the Art - CLIPMERGE
 
Health Datapalooza 2013: Data Design Diabetes Demo Day Intro
Health Datapalooza 2013: Data Design Diabetes Demo Day IntroHealth Datapalooza 2013: Data Design Diabetes Demo Day Intro
Health Datapalooza 2013: Data Design Diabetes Demo Day Intro
 
Health Datapalooza 2013: Delos Cosgrove
Health Datapalooza 2013: Delos CosgroveHealth Datapalooza 2013: Delos Cosgrove
Health Datapalooza 2013: Delos Cosgrove
 

Similar to Health Datapalooza 2013: Open Government Data - Natasha Noy

The Logical Model Designer - Binding Information Models to Terminology
The Logical Model Designer - Binding Information Models to TerminologyThe Logical Model Designer - Binding Information Models to Terminology
The Logical Model Designer - Binding Information Models to TerminologySnow Owl
 
NCBO Entry: Metadata Developer Chalenge
NCBO Entry: Metadata Developer ChalengeNCBO Entry: Metadata Developer Chalenge
NCBO Entry: Metadata Developer Chalengenatashafn
 
Semantic Web Technologies: A Paradigm for Medical Informatics
Semantic Web Technologies: A Paradigm for Medical InformaticsSemantic Web Technologies: A Paradigm for Medical Informatics
Semantic Web Technologies: A Paradigm for Medical InformaticsChimezie Ogbuji
 
Enhancing the Quality of ImmPort Data
Enhancing the Quality of ImmPort DataEnhancing the Quality of ImmPort Data
Enhancing the Quality of ImmPort DataBarry Smith
 
Beyond Transparency: Success & Lessons From tambisBoston2003
Beyond Transparency: Success & Lessons From tambisBoston2003Beyond Transparency: Success & Lessons From tambisBoston2003
Beyond Transparency: Success & Lessons From tambisBoston2003robertstevens65
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemWarren Kibbe
 
WEBINAR: The Yosemite Project: An RDF Roadmap for Healthcare Information Inte...
WEBINAR: The Yosemite Project: An RDF Roadmap for Healthcare Information Inte...WEBINAR: The Yosemite Project: An RDF Roadmap for Healthcare Information Inte...
WEBINAR: The Yosemite Project: An RDF Roadmap for Healthcare Information Inte...DATAVERSITY
 
Standards for clinical research data - steps to an information model (CRIM).
Standards for clinical research data - steps to an information model (CRIM).Standards for clinical research data - steps to an information model (CRIM).
Standards for clinical research data - steps to an information model (CRIM).Wolfgang Kuchinke
 
Ontology Web Services for Semantic Applications
Ontology Web Services for Semantic Applications Ontology Web Services for Semantic Applications
Ontology Web Services for Semantic Applications Trish Whetzel
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemWarren Kibbe
 
Implementing chemistry platform for OpenPHACTS
Implementing chemistry platform for OpenPHACTSImplementing chemistry platform for OpenPHACTS
Implementing chemistry platform for OpenPHACTSValery Tkachenko
 
Designing and launching the Clinical Reference Library
Designing and launching the Clinical Reference LibraryDesigning and launching the Clinical Reference Library
Designing and launching the Clinical Reference LibraryKerstin Forsberg
 
ELSS use cases and strategy
ELSS use cases and strategyELSS use cases and strategy
ELSS use cases and strategyAnton Yuryev
 
Open innovation contributions from RSC resulting from the Open Phacts project
Open innovation contributions from RSC resulting from the Open Phacts projectOpen innovation contributions from RSC resulting from the Open Phacts project
Open innovation contributions from RSC resulting from the Open Phacts projectKen Karapetyan
 
Implementation and Use of ISO EN 13606 and openEHR
Implementation and Use of ISO EN 13606 and openEHRImplementation and Use of ISO EN 13606 and openEHR
Implementation and Use of ISO EN 13606 and openEHRKoray Atalag
 
Enabling Semantically Aware Software Applications
Enabling Semantically Aware Software Applications Enabling Semantically Aware Software Applications
Enabling Semantically Aware Software Applications Trish Whetzel
 
Leveraging CEDAR workbench for ontology-linked submission of adaptive immune ...
Leveraging CEDAR workbench for ontology-linked submission of adaptive immune ...Leveraging CEDAR workbench for ontology-linked submission of adaptive immune ...
Leveraging CEDAR workbench for ontology-linked submission of adaptive immune ...Syed Ahmad Chan Bukhari, PhD
 

Similar to Health Datapalooza 2013: Open Government Data - Natasha Noy (20)

The Logical Model Designer - Binding Information Models to Terminology
The Logical Model Designer - Binding Information Models to TerminologyThe Logical Model Designer - Binding Information Models to Terminology
The Logical Model Designer - Binding Information Models to Terminology
 
NCBO Entry: Metadata Developer Chalenge
NCBO Entry: Metadata Developer ChalengeNCBO Entry: Metadata Developer Chalenge
NCBO Entry: Metadata Developer Chalenge
 
Semantic Web Technologies: A Paradigm for Medical Informatics
Semantic Web Technologies: A Paradigm for Medical InformaticsSemantic Web Technologies: A Paradigm for Medical Informatics
Semantic Web Technologies: A Paradigm for Medical Informatics
 
Enhancing the Quality of ImmPort Data
Enhancing the Quality of ImmPort DataEnhancing the Quality of ImmPort Data
Enhancing the Quality of ImmPort Data
 
Beyond Transparency: Success & Lessons From tambisBoston2003
Beyond Transparency: Success & Lessons From tambisBoston2003Beyond Transparency: Success & Lessons From tambisBoston2003
Beyond Transparency: Success & Lessons From tambisBoston2003
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health System
 
WEBINAR: The Yosemite Project: An RDF Roadmap for Healthcare Information Inte...
WEBINAR: The Yosemite Project: An RDF Roadmap for Healthcare Information Inte...WEBINAR: The Yosemite Project: An RDF Roadmap for Healthcare Information Inte...
WEBINAR: The Yosemite Project: An RDF Roadmap for Healthcare Information Inte...
 
Standards for clinical research data - steps to an information model (CRIM).
Standards for clinical research data - steps to an information model (CRIM).Standards for clinical research data - steps to an information model (CRIM).
Standards for clinical research data - steps to an information model (CRIM).
 
BioSharing - Update - Feb2016
BioSharing - Update - Feb2016BioSharing - Update - Feb2016
BioSharing - Update - Feb2016
 
Ontology Web Services for Semantic Applications
Ontology Web Services for Semantic Applications Ontology Web Services for Semantic Applications
Ontology Web Services for Semantic Applications
 
Data Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health SystemData Harmonization for a Molecularly Driven Health System
Data Harmonization for a Molecularly Driven Health System
 
Implementing chemistry platform for OpenPHACTS
Implementing chemistry platform for OpenPHACTSImplementing chemistry platform for OpenPHACTS
Implementing chemistry platform for OpenPHACTS
 
A Case for linked Data for Medical Devices in the IVD Market
A Case for linked Data for Medical Devices in the IVD MarketA Case for linked Data for Medical Devices in the IVD Market
A Case for linked Data for Medical Devices in the IVD Market
 
Designing and launching the Clinical Reference Library
Designing and launching the Clinical Reference LibraryDesigning and launching the Clinical Reference Library
Designing and launching the Clinical Reference Library
 
ELSS use cases and strategy
ELSS use cases and strategyELSS use cases and strategy
ELSS use cases and strategy
 
Open innovation contributions from RSC resulting from the Open Phacts project
Open innovation contributions from RSC resulting from the Open Phacts projectOpen innovation contributions from RSC resulting from the Open Phacts project
Open innovation contributions from RSC resulting from the Open Phacts project
 
Open innovation contributions from RSC resulting from the Open Phacts project
Open innovation contributions from RSC resulting from the Open Phacts projectOpen innovation contributions from RSC resulting from the Open Phacts project
Open innovation contributions from RSC resulting from the Open Phacts project
 
Implementation and Use of ISO EN 13606 and openEHR
Implementation and Use of ISO EN 13606 and openEHRImplementation and Use of ISO EN 13606 and openEHR
Implementation and Use of ISO EN 13606 and openEHR
 
Enabling Semantically Aware Software Applications
Enabling Semantically Aware Software Applications Enabling Semantically Aware Software Applications
Enabling Semantically Aware Software Applications
 
Leveraging CEDAR workbench for ontology-linked submission of adaptive immune ...
Leveraging CEDAR workbench for ontology-linked submission of adaptive immune ...Leveraging CEDAR workbench for ontology-linked submission of adaptive immune ...
Leveraging CEDAR workbench for ontology-linked submission of adaptive immune ...
 

More from Health Data Consortium

The Health Care Cost Institute’s National Transparency Initiative
The Health Care Cost Institute’sNational Transparency InitiativeThe Health Care Cost Institute’sNational Transparency Initiative
The Health Care Cost Institute’s National Transparency InitiativeHealth Data Consortium
 
From Research to Practice - New Models for Data-sharing and Collaboration to ...
From Research to Practice - New Models for Data-sharing and Collaboration to ...From Research to Practice - New Models for Data-sharing and Collaboration to ...
From Research to Practice - New Models for Data-sharing and Collaboration to ...Health Data Consortium
 
Addressing Privacy and Security Concerns to Unlock Insights in Big Data in He...
Addressing Privacy and Security Concerns to Unlock Insights in Big Data in He...Addressing Privacy and Security Concerns to Unlock Insights in Big Data in He...
Addressing Privacy and Security Concerns to Unlock Insights in Big Data in He...Health Data Consortium
 
The HHS Health Data Initiative (HDI) Strategy & Execution Plan with Damon Davis
The HHS Health Data Initiative (HDI) Strategy & Execution Plan with Damon DavisThe HHS Health Data Initiative (HDI) Strategy & Execution Plan with Damon Davis
The HHS Health Data Initiative (HDI) Strategy & Execution Plan with Damon DavisHealth Data Consortium
 
Clinical Trial Data Transparency: Explaining Governance for Public Data Sharing
Clinical Trial Data Transparency:  Explaining Governance for Public Data SharingClinical Trial Data Transparency:  Explaining Governance for Public Data Sharing
Clinical Trial Data Transparency: Explaining Governance for Public Data SharingHealth Data Consortium
 
Exchanges go live: early trends in competitor dynamics
Exchanges go live: early trends in competitor dynamicsExchanges go live: early trends in competitor dynamics
Exchanges go live: early trends in competitor dynamicsHealth Data Consortium
 
Liberating Health Data: What we learned in New York, with Dr. Nirav Shah
Liberating Health Data: What we learned in New York, with Dr. Nirav ShahLiberating Health Data: What we learned in New York, with Dr. Nirav Shah
Liberating Health Data: What we learned in New York, with Dr. Nirav ShahHealth Data Consortium
 
Health Datapalooza 2013: Datalab - Victor Lazarro
Health Datapalooza 2013: Datalab - Victor LazarroHealth Datapalooza 2013: Datalab - Victor Lazarro
Health Datapalooza 2013: Datalab - Victor LazarroHealth Data Consortium
 
Health Datapalooza 2013: Datalab - Steven Edwards
Health Datapalooza 2013: Datalab - Steven EdwardsHealth Datapalooza 2013: Datalab - Steven Edwards
Health Datapalooza 2013: Datalab - Steven EdwardsHealth Data Consortium
 
Health Datapalooza 2013: Datalab - Rick Moser
Health Datapalooza 2013: Datalab - Rick MoserHealth Datapalooza 2013: Datalab - Rick Moser
Health Datapalooza 2013: Datalab - Rick MoserHealth Data Consortium
 
Health Datapalooza 2013: Datalab - David Forrest
Health Datapalooza 2013: Datalab - David ForrestHealth Datapalooza 2013: Datalab - David Forrest
Health Datapalooza 2013: Datalab - David ForrestHealth Data Consortium
 
Health Datapalooza 2013: Datalab - Steve Emrick
Health Datapalooza 2013: Datalab - Steve EmrickHealth Datapalooza 2013: Datalab - Steve Emrick
Health Datapalooza 2013: Datalab - Steve EmrickHealth Data Consortium
 
Health Datapalooza 2013: Datalab - Mike Byrne
Health Datapalooza 2013: Datalab - Mike ByrneHealth Datapalooza 2013: Datalab - Mike Byrne
Health Datapalooza 2013: Datalab - Mike ByrneHealth Data Consortium
 
Health Datapalooza 2013: Datalab - Eugene Hayes
Health Datapalooza 2013: Datalab - Eugene HayesHealth Datapalooza 2013: Datalab - Eugene Hayes
Health Datapalooza 2013: Datalab - Eugene HayesHealth Data Consortium
 
Health Datapalooza 2013: Datalab - Damon Davis
Health Datapalooza 2013: Datalab - Damon DavisHealth Datapalooza 2013: Datalab - Damon Davis
Health Datapalooza 2013: Datalab - Damon DavisHealth Data Consortium
 
Health Datapalooza 2013: Bootcamp - cards
Health Datapalooza 2013: Bootcamp - cardsHealth Datapalooza 2013: Bootcamp - cards
Health Datapalooza 2013: Bootcamp - cardsHealth Data Consortium
 
Health Datapalooza 2013: HDC Affiliates Apps Demos - Involution Studios hGraph
Health Datapalooza 2013: HDC Affiliates Apps Demos - Involution Studios hGraphHealth Datapalooza 2013: HDC Affiliates Apps Demos - Involution Studios hGraph
Health Datapalooza 2013: HDC Affiliates Apps Demos - Involution Studios hGraphHealth Data Consortium
 
Health Datapalooza 2013: Cooperation Without Coordination
Health Datapalooza 2013: Cooperation Without CoordinationHealth Datapalooza 2013: Cooperation Without Coordination
Health Datapalooza 2013: Cooperation Without CoordinationHealth Data Consortium
 
Health Datapalooza 2013: Hearing from the Community - Richard Martin
Health Datapalooza 2013: Hearing from the Community - Richard MartinHealth Datapalooza 2013: Hearing from the Community - Richard Martin
Health Datapalooza 2013: Hearing from the Community - Richard MartinHealth Data Consortium
 

More from Health Data Consortium (20)

The Health Care Cost Institute’s National Transparency Initiative
The Health Care Cost Institute’sNational Transparency InitiativeThe Health Care Cost Institute’sNational Transparency Initiative
The Health Care Cost Institute’s National Transparency Initiative
 
From Research to Practice - New Models for Data-sharing and Collaboration to ...
From Research to Practice - New Models for Data-sharing and Collaboration to ...From Research to Practice - New Models for Data-sharing and Collaboration to ...
From Research to Practice - New Models for Data-sharing and Collaboration to ...
 
Addressing Privacy and Security Concerns to Unlock Insights in Big Data in He...
Addressing Privacy and Security Concerns to Unlock Insights in Big Data in He...Addressing Privacy and Security Concerns to Unlock Insights in Big Data in He...
Addressing Privacy and Security Concerns to Unlock Insights in Big Data in He...
 
The HHS Health Data Initiative (HDI) Strategy & Execution Plan with Damon Davis
The HHS Health Data Initiative (HDI) Strategy & Execution Plan with Damon DavisThe HHS Health Data Initiative (HDI) Strategy & Execution Plan with Damon Davis
The HHS Health Data Initiative (HDI) Strategy & Execution Plan with Damon Davis
 
Clinical Trial Data Transparency: Explaining Governance for Public Data Sharing
Clinical Trial Data Transparency:  Explaining Governance for Public Data SharingClinical Trial Data Transparency:  Explaining Governance for Public Data Sharing
Clinical Trial Data Transparency: Explaining Governance for Public Data Sharing
 
Exchanges go live: early trends in competitor dynamics
Exchanges go live: early trends in competitor dynamicsExchanges go live: early trends in competitor dynamics
Exchanges go live: early trends in competitor dynamics
 
Liberating Health Data: What we learned in New York, with Dr. Nirav Shah
Liberating Health Data: What we learned in New York, with Dr. Nirav ShahLiberating Health Data: What we learned in New York, with Dr. Nirav Shah
Liberating Health Data: What we learned in New York, with Dr. Nirav Shah
 
Health Datapalooza 2013: Datalab - Victor Lazarro
Health Datapalooza 2013: Datalab - Victor LazarroHealth Datapalooza 2013: Datalab - Victor Lazarro
Health Datapalooza 2013: Datalab - Victor Lazarro
 
Health Datapalooza 2013: Datalab - Steven Edwards
Health Datapalooza 2013: Datalab - Steven EdwardsHealth Datapalooza 2013: Datalab - Steven Edwards
Health Datapalooza 2013: Datalab - Steven Edwards
 
Health Datapalooza 2013: Datalab - Rick Moser
Health Datapalooza 2013: Datalab - Rick MoserHealth Datapalooza 2013: Datalab - Rick Moser
Health Datapalooza 2013: Datalab - Rick Moser
 
Health Datapalooza 2013: Datalab - David Forrest
Health Datapalooza 2013: Datalab - David ForrestHealth Datapalooza 2013: Datalab - David Forrest
Health Datapalooza 2013: Datalab - David Forrest
 
Health Datapalooza 2013: Datalab - Steve Emrick
Health Datapalooza 2013: Datalab - Steve EmrickHealth Datapalooza 2013: Datalab - Steve Emrick
Health Datapalooza 2013: Datalab - Steve Emrick
 
Health Datapalooza 2013: Datalab - Mike Byrne
Health Datapalooza 2013: Datalab - Mike ByrneHealth Datapalooza 2013: Datalab - Mike Byrne
Health Datapalooza 2013: Datalab - Mike Byrne
 
Health Datapalooza 2013: Datalab - Eugene Hayes
Health Datapalooza 2013: Datalab - Eugene HayesHealth Datapalooza 2013: Datalab - Eugene Hayes
Health Datapalooza 2013: Datalab - Eugene Hayes
 
Health Datapalooza 2013: Datalab - Damon Davis
Health Datapalooza 2013: Datalab - Damon DavisHealth Datapalooza 2013: Datalab - Damon Davis
Health Datapalooza 2013: Datalab - Damon Davis
 
Health Datapalooza 2013: Bootcamp - cards
Health Datapalooza 2013: Bootcamp - cardsHealth Datapalooza 2013: Bootcamp - cards
Health Datapalooza 2013: Bootcamp - cards
 
Health Datapalooza 2013: HDC Affiliates Apps Demos - Involution Studios hGraph
Health Datapalooza 2013: HDC Affiliates Apps Demos - Involution Studios hGraphHealth Datapalooza 2013: HDC Affiliates Apps Demos - Involution Studios hGraph
Health Datapalooza 2013: HDC Affiliates Apps Demos - Involution Studios hGraph
 
Health Datapalooza 2013: Linked Data
Health Datapalooza 2013: Linked DataHealth Datapalooza 2013: Linked Data
Health Datapalooza 2013: Linked Data
 
Health Datapalooza 2013: Cooperation Without Coordination
Health Datapalooza 2013: Cooperation Without CoordinationHealth Datapalooza 2013: Cooperation Without Coordination
Health Datapalooza 2013: Cooperation Without Coordination
 
Health Datapalooza 2013: Hearing from the Community - Richard Martin
Health Datapalooza 2013: Hearing from the Community - Richard MartinHealth Datapalooza 2013: Hearing from the Community - Richard Martin
Health Datapalooza 2013: Hearing from the Community - Richard Martin
 

Health Datapalooza 2013: Open Government Data - Natasha Noy

  • 1. Healthdata.gov Metadata: Lifting Schemes and Controlled Vocabularies Mark Musen Natasha Noy National Center for Biomedical Ontology Stanford Center for Biomedical Informatics Research Stanford University
  • 2. National Center for Biomedical Ontology • We create and maintain a library of biomedical ontologies and terminologies. • We build tools and Web services to enable the use of ontologies and terminologies. • We collaborate with scientific communities that develop and use ontologies and terminologies in biomedicine. 2
  • 3. Controlled Vocabularies in Healthcare Healthcare Common Procedure Coding System (HCPCS) Current Procedural Terminology (CPT) International Classification of Diseases (ICD) RxNorm The National Drug File - Reference Terminology (NDF-RT)
  • 4.
  • 5. National Center for Biomedical Ontology • Provides key technology for – Describing medical terminologies Accessing information about terms in medical terminologies – Using mappings across terminologies – Annotating content with terms from medical terminologies
  • 6. BioPortal stores Big Data about Big Data
  • 7. Mappings Between Terminologies • Available through a REST API and SPARQL endpoint • Example: Term mappings from HCPCS to CPT source term Mapped to CPT: lexically and through UMLS CUI Mapping metadata
  • 8.
  • 9.
  • 10. Protégé: Editing Ontologies and Terminologies • An open-source ontology editor • Has more than 200,000 registered users • Works in a web browser • Supports collaboration • Built on Semantic Web standards • Has dozens of plugins developed by our user community
  • 13. The Anatomy of a Dataset Basic metadata about the dataset published by HHS, describes Part B National Summary Data File, covers the period from Jan 1, 2000 to Dec 31, 2000, ... Metadata about the structure of the dataset allowed services, charges, and payments for each HCPCS code and modifier Content of the dataset charge of $4,966 for radiology procedure Focus of the Metadata Challenge
  • 14. The NCBO Response to the Healthdata.gov Metadata Challenge • Protégé to create dataset descriptions • Custom scripts to extract metadata • BioPortal to provide terminologies • SPARQL endpoint to access the resulting knowledge graph
  • 15. Metadata about the structure of the dataset Anchoring values in controlled terminologies Basic metadata about the dataset NCBO Solution: Outline Metadata about the structure of the dataset Anchoring values in controlled terminologies http://healthdata.bioontology.org
  • 16. Bring Healthdata.gov Datasets into the Linked Open Data Cloud
  • 17. Step 1: Basic Metadata About the Dataset
  • 18. Links to Additional Metadata and Vocabularies
  • 19. Linking Healthdata Datasets to Other Metadata Find all reports authored by the Department of Health and Human Services and its agencies annually
  • 20. PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX dc: <http://purl.org/dc/terms/> PREFIX dbpedia: <http://dbpedia.org/resource/> PREFIX org: <http://www.w3.org/ns/org#> SELECT DISTINCT ?title WHERE { ?ds dc:accrualPeriodicity ?per . ?per rdfs:label "Annual" . ?ds dc:creator ?ag . ?ag org:subOrganzationOf dbpedia:United_States_Department_of_Health_and_Human_Services . ?ds dc:title ?title } GROUP BY ?title Ask it in SPARQL Find all reports authored by the Department of Health and Human Services and its agencies annually
  • 21. PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX dc: <http://purl.org/dc/terms/> PREFIX dbpedia: <http://dbpedia.org/resource/> PREFIX org: <http://www.w3.org/ns/org#> SELECT DISTINCT ?title WHERE { ?ds dc:accrualPeriodicity ?per . ?per rdfs:label "Annual" . ?ds dc:creator ?ag . ?ag org:subOrganzationOf dbpedia:United_States_Department_of_Health_and_Human_Services . ?ds dc:title ?title } GROUP BY ?title
  • 22. Basic metadata about the dataset Anchoring values in controlled terminologies NCBO Solution: Outline Metadata about the structure of the dataset http://healthdata.bioontology.org Basic metadata about the dataset Anchoring values in controlled terminologies
  • 23. There Is No Free Lunch • Modeling metadata enables us to understand how the content is related • Necessary if we want to integrate data
  • 24. Dataset Descriptions Are Buried in Text Files Part B National Summary Data File Map HCPCS code and HCPCS modifier to the amount charged for allowed services, allowed charges, and payment RDF Data Cube Vocabulary
  • 25. Modeling Dimensions and Measures of the data PartB National Summary data rdf:type: qb:Dataset, dcat:Dataset DSD rdf:type: qb:DataStructureDefinition qb:structure HCPCS/CPT Payments rdf:type: qb:ComponentSpecfication qb:component HCPCS or CPT Code rdf:type: qb:CodedProperty, qb:DimensionProperty qb:dimension HCPCS or CPT modifier rdf:type: qb:CodedProperty, qb:DimensionProperty qb:dimension Number of services rdf:type: qb:MeasureProperty qb:measure Allowed charges rdf:type: qb:MeasurePropertyqb:measure Payment for services rdf:type: qb:MeasureProperty qb:measure HCPCS codes rdf:type: skos:ConceptScheme HCPCS modifiers rdf:type: skos:ConceptScheme qb:codeList qb:codeList CPT codes rdf:type: skos:ConceptScheme qb:codeList CPT modifiers rdf:type: skos:ConceptScheme qb:codeList
  • 26. PartB National Summary data rdf:type: qb:Dataset, dcat:Dataset DSD rdf:type: qb:DataStructureDefinition qb:structure HCPCS/CPT Payments rdf:type: qb:ComponentSpecfication qb:component HCPCS or CPT Code rdf:type: qb:CodedProperty, qb:DimensionProperty qb:dimension HCPCS or CPT modifier rdf:type: qb:CodedProperty, qb:DimensionProperty qb:dimension Number of services rdf:type: qb:MeasureProperty qb:measure Allowed charges rdf:type: qb:MeasurePropertyqb:measure Payment for services rdf:type: qb:MeasureProperty qb:measure HCPCS codes rdf:type: skos:ConceptScheme HCPCS modifiers rdf:type: skos:ConceptScheme qb:codeList qb:codeList CPT codes rdf:type: skos:ConceptScheme qb:codeList CPT modifiers rdf:type: skos:ConceptScheme qb:codeList Structured dataset definition: ~20 triples Dataset content: ~10,000−1,000,000 rows
  • 27. PREFIX dcat: <http://www.w3.org/ns/dcat#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX qb: <http://purl.org/linked-data/cube#> PREFIX bp: <http://purl.bioontology.org/ontology/> PREFIX dsd: <http://purl.bioontology.org/healthdata/dsd/> SELECT DISTINCT ?title WHERE { ?dataset a dcat:Dataset . ?dataset qb:structure ?dsd . ?dsd qb:component ?cmp . ?cmp qb:dimension ?dim . ?dataset rdfs:label ?title . ?dim qb:codeList bp:CPT . ?cmp qb:measure dsd:number-of-services . } Find datasets that map HCPCS and CPT codes to number of services for each code
  • 28. PREFIX dcat: <http://www.w3.org/ns/dcat#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX qb: <http://purl.org/linked-data/cube#> PREFIX bp: <http://purl.bioontology.org/ontology/> PREFIX dsd: <http://purl.bioontology.org/healthdata/dsd/> SELECT DISTINCT ?title WHERE { ?dataset a dcat:Dataset . ?dataset qb:structure ?dsd . ?dsd qb:component ?cmp . ?cmp qb:dimension ?dim . ?dataset rdfs:label ?title . ?dim qb:codeList bp:CPT . ?cmp qb:measure dsd:number-of-services . }
  • 29. Basic metadata about the dataset Metadata about the structure of the dataset NCBO Solution: Outline Anchoring values in controlled terminologies Basic metadata about the dataset Metadata about the structure of the dataset http://healthdata.bioontology.org
  • 30. Controlled Vocabularies in Healthcare Healthcare Common Procedure Coding System (HCPCS) Current Procedural Terminology (CPT) International Classification of Diseases (ICD) RxNorm The National Drug File - Reference Terminology (NDF-RT)
  • 31. NCBO BioPortal • Uniform access to 330 public ontologies and terminologies in biomedicine • Web interface • REST API • Search across all ontologies • Resolvable URIs for each term • Mappings between terms in different ontologies
  • 32. From Codes to Ontology Terms
  • 33. BioPortal as a Terminology Service • Resolvable Uniform Resource Identifiers (URIs) • Mappings to other terminologies – Including metadata – Including multiple mappings from a variety of sources • Regular terminology updates from primary sources • Ability to define and upload new value sets – We defined several value sets for the Metadata Challenge – Linked them to other terminologies in BioPortal
  • 35. NCBO Solution: Outline Metadata about the structure of the dataset Basic metadata about the dataset Anchoring in controlled terminologies http://healthdata.bioontology.org
  • 36. Recipe for Linking Government Data • Describe the information about each dataset – Specify provenance – Use standard representation schemes – Use consensus vocabularies • Describe the metadata about the content of the dataset – Involve domain experts who understand the structure of the data – Use consensus vocabularies (e.g., W3C RDF Cube) • Anchor the values in controlled vocabularies and datasets – Use existing terminologies – Define and publish value sets if none exist – Map the value sets to standard terminologies
  • 37. What we have learned from the challenge • Our entry demonstrated feasibility of modeling metadata and linking them to standard vocabularies – Uses top–down approach – Enables “deep” integration – Extracts knowledge from data • We need tools and scripts that are specific to the task of dataset description
  • 38. NCBO Solution: Outline Metadata about the structure of the dataset Basic metadata about the dataset Anchoring in controlled terminologies http://healthdata.bioontology.org

Editor's Notes

  1. 400K terms
  2. Make bigger
  3. Secret sauce: going into rdf
  4. Authored byShow the query and what comes backLOD cloud
  5. Anchoring what?