SlideShare a Scribd company logo
Dataset Descriptions in
Open PHACTS and
W3C HCLS IG
Alasdair J G Gray
Heriot-Watt University
www.alasdairjggray.co.uk A.J.G.Gray@hw.ac.uk
NDEx Call, April 2014
Nanopub
Db
VoID
Data Cache
(Virtuoso Triple Store)
Semantic Workflow Engine
Linked Data API (RDF/XML, TTL, JSON)
Domain
Specific
Services
Identity
Resolution
Service
Chemistry
Registration
Normalisation
& Q/C
Identifier
Management
Service
Indexing
CorePlatform
P12374
EC2.43.4
CS4532
“Adenosine
receptor 2a”
VoID
Db
Nanopub
Db
VoID
Db
VoID
Nanopub
VoID
Public Content Commercial
Public
Ontologies
User
Annotations
Apps
Data Cache
(Triple Store)
Semantic Workflow Engine
Linked Data API (RDF/XML, TTL, JSON)
Domain
Specific
Services
Identity
Resolution
Service
Identifier
Management
Service
CorePlatform
P12374
EC2.43.4
CS4532
“Adenosine
receptor 2a”
ChEMBL-
RDF
ChEMBL
Apps
Chem2Bio2
RDF
SD
v13v12
v2 or v8
ChemSpider
• Data aggregator: over 400 sources
– What data does it contain?
– What version of ?? did they load?
– When are new versions loaded?
• OPS data covers
– ChEBI
– ChEMBL
– DrugBank
2 April 2014 OPS Dataset Descriptions – A. J. G. Gray 5
Metadata Challenges
• Datasets available
– In many versions over time
– In different formats
– From many mirrors/registries
• Datasets build on each other
• Files do not carry metadata
• Registries
– Can be out-of-date
– Can contain conflicting information
2 April 2014 OPS Dataset Descriptions – A. J. G. Gray 6
Users require
data
provenance!
2 April 2014 OPS Dataset Descriptions – A. J. G. Gray 7
2 April 2014 OPS Dataset Descriptions – A. J. G. Gray 8
Description Model
2 April 2014 OPS Dataset Descriptions – A. J. G. Gray 9
Realisation of Dataset Descriptions
• Needs to be incorporated into data publishing
pipeline
• Hard for publishers to provide conformant
descriptions
– Datasets are complex
– Evolve over time
– Seen as yet another burden
2 April 2014 OPS Dataset Descriptions – A. J. G. Gray 15
VoID Editor
2 April 2014 OPS Dataset Descriptions – A. J. G. Gray 16
Validator
2 April 2014 OPS Dataset Descriptions – A. J. G. Gray 17
W3C HCLS Group
HCLS Community Profile Model
2 April 2014 OPS Dataset Descriptions – A. J. G. Gray 19
Future Vision
Metadata: Write once, use many times
• Provide rich and accurate provenance trail of
data
– Automatic pipeline from VoID file to registries
• Align Open PHACTS with W3C HCLS
– Update tools for HCLS profile
2 April 2014 OPS Dataset Descriptions – A. J. G. Gray 20
A.J.G.Gray@hw.ac.uk
www.alasdairjggray.co.uk
www.openphacts.org

More Related Content

What's hot

Seamless access to the world’s open access research papers via ResourceSync
Seamless access to the world’s open access research papers via ResourceSyncSeamless access to the world’s open access research papers via ResourceSync
Seamless access to the world’s open access research papers via ResourceSync
petrknoth
 
Research Plan 2014
Research Plan 2014Research Plan 2014
Research Plan 2014
Alejandro Llaves
 
Rdf saturator
Rdf saturatorRdf saturator
Rdf saturator
INRIA-OAK
 
Diversity++2015 talk: R2R+BCO-DMO - Linked Oceanographic Datasets
Diversity++2015 talk: R2R+BCO-DMO - Linked Oceanographic DatasetsDiversity++2015 talk: R2R+BCO-DMO - Linked Oceanographic Datasets
Diversity++2015 talk: R2R+BCO-DMO - Linked Oceanographic Datasets
Adila Krisnadhi
 
Whowas: History of resources at APNIC
Whowas: History of resources at APNICWhowas: History of resources at APNIC
Whowas: History of resources at APNIC
APNIC
 
R reproducibility
R reproducibilityR reproducibility
R reproducibility
Revolution Analytics
 
Predicting Loan Delinquency at One Million Transactions per Second
Predicting Loan Delinquency at One Million Transactions per SecondPredicting Loan Delinquency at One Million Transactions per Second
Predicting Loan Delinquency at One Million Transactions per Second
Revolution Analytics
 
apidays LIVE Paris 2021 - GraphQL Today and Tomorrow by Uri Goldshtein, The G...
apidays LIVE Paris 2021 - GraphQL Today and Tomorrow by Uri Goldshtein, The G...apidays LIVE Paris 2021 - GraphQL Today and Tomorrow by Uri Goldshtein, The G...
apidays LIVE Paris 2021 - GraphQL Today and Tomorrow by Uri Goldshtein, The G...
apidays
 
R Then and Now
R Then and NowR Then and Now
R Then and Now
Revolution Analytics
 
Analytics and Access to the UK web archive
Analytics and Access to the UK web archiveAnalytics and Access to the UK web archive
Analytics and Access to the UK web archiveLewis Crawford
 
Clustering Search to Navigate A Case Study of the Canadian World Wide Web as ...
Clustering Search to Navigate A Case Study of the Canadian World Wide Web as ...Clustering Search to Navigate A Case Study of the Canadian World Wide Web as ...
Clustering Search to Navigate A Case Study of the Canadian World Wide Web as ...
Ian Milligan
 
4Science presentes: ORCiD API Tutorial
4Science presentes: ORCiD API Tutorial4Science presentes: ORCiD API Tutorial
4Science presentes: ORCiD API Tutorial
4Science
 
S3 VFD
S3 VFDS3 VFD
DSpace-CRIS: new features and contribution to the DSpace mainstream
DSpace-CRIS: new features and contribution to the DSpace mainstreamDSpace-CRIS: new features and contribution to the DSpace mainstream
DSpace-CRIS: new features and contribution to the DSpace mainstream
Andrea Bollini
 
Semantically-Enabled Digital Investigations
Semantically-Enabled Digital InvestigationsSemantically-Enabled Digital Investigations
Semantically-Enabled Digital Investigations
inbroker
 
ICIC 2013 New Product Introductions Minesoft
ICIC 2013 New Product Introductions MinesoftICIC 2013 New Product Introductions Minesoft
ICIC 2013 New Product Introductions MinesoftDr. Haxel Consult
 
New Product Introductions - FIZ Karlsruhe
New Product Introductions - FIZ KarlsruheNew Product Introductions - FIZ Karlsruhe
New Product Introductions - FIZ Karlsruhe
Dr. Haxel Consult
 
Implementing BigPetStore with Apache Flink
Implementing BigPetStore with Apache FlinkImplementing BigPetStore with Apache Flink
Implementing BigPetStore with Apache Flink
Márton Balassi
 
Exploring linked data in r
Exploring linked data in rExploring linked data in r
Exploring linked data in r
David Sherlock
 
BBC News Labs at ISKO Conference, UCL, London - July 2013
BBC News Labs at ISKO Conference, UCL, London - July 2013BBC News Labs at ISKO Conference, UCL, London - July 2013
BBC News Labs at ISKO Conference, UCL, London - July 2013
BBC News Labs
 

What's hot (20)

Seamless access to the world’s open access research papers via ResourceSync
Seamless access to the world’s open access research papers via ResourceSyncSeamless access to the world’s open access research papers via ResourceSync
Seamless access to the world’s open access research papers via ResourceSync
 
Research Plan 2014
Research Plan 2014Research Plan 2014
Research Plan 2014
 
Rdf saturator
Rdf saturatorRdf saturator
Rdf saturator
 
Diversity++2015 talk: R2R+BCO-DMO - Linked Oceanographic Datasets
Diversity++2015 talk: R2R+BCO-DMO - Linked Oceanographic DatasetsDiversity++2015 talk: R2R+BCO-DMO - Linked Oceanographic Datasets
Diversity++2015 talk: R2R+BCO-DMO - Linked Oceanographic Datasets
 
Whowas: History of resources at APNIC
Whowas: History of resources at APNICWhowas: History of resources at APNIC
Whowas: History of resources at APNIC
 
R reproducibility
R reproducibilityR reproducibility
R reproducibility
 
Predicting Loan Delinquency at One Million Transactions per Second
Predicting Loan Delinquency at One Million Transactions per SecondPredicting Loan Delinquency at One Million Transactions per Second
Predicting Loan Delinquency at One Million Transactions per Second
 
apidays LIVE Paris 2021 - GraphQL Today and Tomorrow by Uri Goldshtein, The G...
apidays LIVE Paris 2021 - GraphQL Today and Tomorrow by Uri Goldshtein, The G...apidays LIVE Paris 2021 - GraphQL Today and Tomorrow by Uri Goldshtein, The G...
apidays LIVE Paris 2021 - GraphQL Today and Tomorrow by Uri Goldshtein, The G...
 
R Then and Now
R Then and NowR Then and Now
R Then and Now
 
Analytics and Access to the UK web archive
Analytics and Access to the UK web archiveAnalytics and Access to the UK web archive
Analytics and Access to the UK web archive
 
Clustering Search to Navigate A Case Study of the Canadian World Wide Web as ...
Clustering Search to Navigate A Case Study of the Canadian World Wide Web as ...Clustering Search to Navigate A Case Study of the Canadian World Wide Web as ...
Clustering Search to Navigate A Case Study of the Canadian World Wide Web as ...
 
4Science presentes: ORCiD API Tutorial
4Science presentes: ORCiD API Tutorial4Science presentes: ORCiD API Tutorial
4Science presentes: ORCiD API Tutorial
 
S3 VFD
S3 VFDS3 VFD
S3 VFD
 
DSpace-CRIS: new features and contribution to the DSpace mainstream
DSpace-CRIS: new features and contribution to the DSpace mainstreamDSpace-CRIS: new features and contribution to the DSpace mainstream
DSpace-CRIS: new features and contribution to the DSpace mainstream
 
Semantically-Enabled Digital Investigations
Semantically-Enabled Digital InvestigationsSemantically-Enabled Digital Investigations
Semantically-Enabled Digital Investigations
 
ICIC 2013 New Product Introductions Minesoft
ICIC 2013 New Product Introductions MinesoftICIC 2013 New Product Introductions Minesoft
ICIC 2013 New Product Introductions Minesoft
 
New Product Introductions - FIZ Karlsruhe
New Product Introductions - FIZ KarlsruheNew Product Introductions - FIZ Karlsruhe
New Product Introductions - FIZ Karlsruhe
 
Implementing BigPetStore with Apache Flink
Implementing BigPetStore with Apache FlinkImplementing BigPetStore with Apache Flink
Implementing BigPetStore with Apache Flink
 
Exploring linked data in r
Exploring linked data in rExploring linked data in r
Exploring linked data in r
 
BBC News Labs at ISKO Conference, UCL, London - July 2013
BBC News Labs at ISKO Conference, UCL, London - July 2013BBC News Labs at ISKO Conference, UCL, London - July 2013
BBC News Labs at ISKO Conference, UCL, London - July 2013
 

Viewers also liked

Bota papa noel_foamy
Bota papa noel_foamyBota papa noel_foamy
Bota papa noel_foamy
Nancy Pulido Arcos
 
SensorBench
SensorBenchSensorBench
SensorBench
Alasdair Gray
 
Things to see in london
Things to see in londonThings to see in london
Things to see in london
lmazuelasg
 
Data Linkage
Data LinkageData Linkage
Data Linkage
Alasdair Gray
 
Including Co-Referent URIs in a SPARQL Query
Including Co-Referent URIs in a SPARQL QueryIncluding Co-Referent URIs in a SPARQL Query
Including Co-Referent URIs in a SPARQL Query
Alasdair Gray
 
Scientific Lenses over Linked Data An approach to support multiple integrate...
Scientific Lenses over Linked Data An approach to support multiple integrate...Scientific Lenses over Linked Data An approach to support multiple integrate...
Scientific Lenses over Linked Data An approach to support multiple integrate...
Alasdair Gray
 
Noti átomo
Noti átomoNoti átomo
Noti átomo
Nancy Pulido Arcos
 
Data Integration in a Big Data Context: An Open PHACTS Case Study
Data Integration in a Big Data Context: An Open PHACTS Case StudyData Integration in a Big Data Context: An Open PHACTS Case Study
Data Integration in a Big Data Context: An Open PHACTS Case Study
Alasdair Gray
 
Sensors and Big Data for Health and Well-being
Sensors and Big Data for Health and Well-beingSensors and Big Data for Health and Well-being
Sensors and Big Data for Health and Well-beingAlasdair Gray
 
Incorporating Commercial and Private Data into an Open Linked Data Platform f...
Incorporating Commercial and Private Data into an Open Linked Data Platform f...Incorporating Commercial and Private Data into an Open Linked Data Platform f...
Incorporating Commercial and Private Data into an Open Linked Data Platform f...
Alasdair Gray
 
Sistema glandular
Sistema glandularSistema glandular
Sistema glandular
Nancy Pulido Arcos
 
Ed pronunciation
Ed pronunciationEd pronunciation
Ed pronunciation
lmazuelasg
 
2013 01-14 ops-dataset_descriptions
2013 01-14 ops-dataset_descriptions2013 01-14 ops-dataset_descriptions
2013 01-14 ops-dataset_descriptions
Alasdair Gray
 
Bota navidad
Bota navidadBota navidad
Bota navidad
Nancy Pulido Arcos
 
Data Science meets Linked Data
Data Science meets Linked DataData Science meets Linked Data
Data Science meets Linked Data
Alasdair Gray
 
The HCLS Community Profile: Describing Datasets, Versions, and Distributions
The HCLS Community Profile: Describing Datasets, Versions, and DistributionsThe HCLS Community Profile: Describing Datasets, Versions, and Distributions
The HCLS Community Profile: Describing Datasets, Versions, and Distributions
Alasdair Gray
 
Tutorial: Describing Datasets with the Health Care and Life Sciences Communit...
Tutorial: Describing Datasets with the Health Care and Life Sciences Communit...Tutorial: Describing Datasets with the Health Care and Life Sciences Communit...
Tutorial: Describing Datasets with the Health Care and Life Sciences Communit...
Alasdair Gray
 

Viewers also liked (18)

Bota papa noel_foamy
Bota papa noel_foamyBota papa noel_foamy
Bota papa noel_foamy
 
SensorBench
SensorBenchSensorBench
SensorBench
 
Things to see in london
Things to see in londonThings to see in london
Things to see in london
 
Data Linkage
Data LinkageData Linkage
Data Linkage
 
Including Co-Referent URIs in a SPARQL Query
Including Co-Referent URIs in a SPARQL QueryIncluding Co-Referent URIs in a SPARQL Query
Including Co-Referent URIs in a SPARQL Query
 
Scientific Lenses over Linked Data An approach to support multiple integrate...
Scientific Lenses over Linked Data An approach to support multiple integrate...Scientific Lenses over Linked Data An approach to support multiple integrate...
Scientific Lenses over Linked Data An approach to support multiple integrate...
 
Noti átomo
Noti átomoNoti átomo
Noti átomo
 
Data Integration in a Big Data Context: An Open PHACTS Case Study
Data Integration in a Big Data Context: An Open PHACTS Case StudyData Integration in a Big Data Context: An Open PHACTS Case Study
Data Integration in a Big Data Context: An Open PHACTS Case Study
 
Sensors and Big Data for Health and Well-being
Sensors and Big Data for Health and Well-beingSensors and Big Data for Health and Well-being
Sensors and Big Data for Health and Well-being
 
Incorporating Commercial and Private Data into an Open Linked Data Platform f...
Incorporating Commercial and Private Data into an Open Linked Data Platform f...Incorporating Commercial and Private Data into an Open Linked Data Platform f...
Incorporating Commercial and Private Data into an Open Linked Data Platform f...
 
Sistema glandular
Sistema glandularSistema glandular
Sistema glandular
 
Ed pronunciation
Ed pronunciationEd pronunciation
Ed pronunciation
 
2013 01-14 ops-dataset_descriptions
2013 01-14 ops-dataset_descriptions2013 01-14 ops-dataset_descriptions
2013 01-14 ops-dataset_descriptions
 
Bota navidad
Bota navidadBota navidad
Bota navidad
 
mit gclog
mit gclogmit gclog
mit gclog
 
Data Science meets Linked Data
Data Science meets Linked DataData Science meets Linked Data
Data Science meets Linked Data
 
The HCLS Community Profile: Describing Datasets, Versions, and Distributions
The HCLS Community Profile: Describing Datasets, Versions, and DistributionsThe HCLS Community Profile: Describing Datasets, Versions, and Distributions
The HCLS Community Profile: Describing Datasets, Versions, and Distributions
 
Tutorial: Describing Datasets with the Health Care and Life Sciences Communit...
Tutorial: Describing Datasets with the Health Care and Life Sciences Communit...Tutorial: Describing Datasets with the Health Care and Life Sciences Communit...
Tutorial: Describing Datasets with the Health Care and Life Sciences Communit...
 

Similar to Dataset Descriptions in Open PHACTS and HCLS

Arabidopsis Information Portal, Developer Workshop 2014, Introduction
Arabidopsis Information Portal, Developer Workshop 2014, IntroductionArabidopsis Information Portal, Developer Workshop 2014, Introduction
Arabidopsis Information Portal, Developer Workshop 2014, Introduction
JasonRafeMiller
 
The new CIARD RING , a machine-readable directory of datasets for agriculture
The new CIARD RING, a machine-readable directory of datasets for agricultureThe new CIARD RING, a machine-readable directory of datasets for agriculture
The new CIARD RING , a machine-readable directory of datasets for agriculture
Valeria Pesce
 
Interoperability is the key: repositories networks promoting the quality and ...
Interoperability is the key: repositories networks promoting the quality and ...Interoperability is the key: repositories networks promoting the quality and ...
Interoperability is the key: repositories networks promoting the quality and ...
Pedro Príncipe
 
RDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataRDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival data
Giorgos Santipantakis
 
The CIARD RING , a global directory of datasets for agriculture, by Valeria P...
The CIARD RING, a global directory of datasets for agriculture, by Valeria P...The CIARD RING, a global directory of datasets for agriculture, by Valeria P...
The CIARD RING , a global directory of datasets for agriculture, by Valeria P...
CIARD Movement
 
Tim Pugh-SPEDDEXES 2014
Tim Pugh-SPEDDEXES 2014Tim Pugh-SPEDDEXES 2014
Tim Pugh-SPEDDEXES 2014
aceas13tern
 
Data Integration And Visualization
Data Integration And VisualizationData Integration And Visualization
Data Integration And Visualization
Ivan Ermilov
 
Enterprise guide to building a Data Mesh
Enterprise guide to building a Data MeshEnterprise guide to building a Data Mesh
Enterprise guide to building a Data Mesh
Sion Smith
 
Tag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh PlatformTag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh Platform
Sanjay Padhi, Ph.D
 
Wed roman tut_open_datapub
Wed roman tut_open_datapubWed roman tut_open_datapub
Wed roman tut_open_datapubeswcsummerschool
 
apidays LIVE Paris 2021 - Stargate.io, An OSS Api Layer for your Cassandra by...
apidays LIVE Paris 2021 - Stargate.io, An OSS Api Layer for your Cassandra by...apidays LIVE Paris 2021 - Stargate.io, An OSS Api Layer for your Cassandra by...
apidays LIVE Paris 2021 - Stargate.io, An OSS Api Layer for your Cassandra by...
apidays
 
CPaaS.io Y1 Review Meeting - Holistic Data Management
CPaaS.io Y1 Review Meeting - Holistic Data ManagementCPaaS.io Y1 Review Meeting - Holistic Data Management
CPaaS.io Y1 Review Meeting - Holistic Data Management
Stephan Haller
 
Arabidopsis Information Portal overview from Plant Biology Europe 2014
Arabidopsis Information Portal overview from Plant Biology Europe 2014Arabidopsis Information Portal overview from Plant Biology Europe 2014
Arabidopsis Information Portal overview from Plant Biology Europe 2014
Matthew Vaughn
 
A BASILar Approach for Building Web APIs on top of SPARQL Endpoints
A BASILar Approach for Building Web APIs on top of SPARQL EndpointsA BASILar Approach for Building Web APIs on top of SPARQL Endpoints
A BASILar Approach for Building Web APIs on top of SPARQL Endpoints
Enrico Daga
 
FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout
Carole Goble
 
Datasets and GATE Evaluation Framework for Benchmarking Wikipedia Based NER S...
Datasets and GATE Evaluation Framework for Benchmarking Wikipedia Based NER S...Datasets and GATE Evaluation Framework for Benchmarking Wikipedia Based NER S...
Datasets and GATE Evaluation Framework for Benchmarking Wikipedia Based NER S...
Milan Dojchinovski
 
BDE SC3.3 Workshop - BDE Platform: Technical overview
 BDE SC3.3 Workshop -  BDE Platform: Technical overview BDE SC3.3 Workshop -  BDE Platform: Technical overview
BDE SC3.3 Workshop - BDE Platform: Technical overview
BigData_Europe
 
Open Services for Lifecycle Collaboration (OSLC) - Extending REST APIs to Con...
Open Services for Lifecycle Collaboration (OSLC) - Extending REST APIs to Con...Open Services for Lifecycle Collaboration (OSLC) - Extending REST APIs to Con...
Open Services for Lifecycle Collaboration (OSLC) - Extending REST APIs to Con...
Axel Reichwein
 
RDF Database-as-a-Service with S4
RDF Database-as-a-Service with S4RDF Database-as-a-Service with S4
RDF Database-as-a-Service with S4
Marin Dimitrov
 

Similar to Dataset Descriptions in Open PHACTS and HCLS (20)

Arabidopsis Information Portal, Developer Workshop 2014, Introduction
Arabidopsis Information Portal, Developer Workshop 2014, IntroductionArabidopsis Information Portal, Developer Workshop 2014, Introduction
Arabidopsis Information Portal, Developer Workshop 2014, Introduction
 
The new CIARD RING , a machine-readable directory of datasets for agriculture
The new CIARD RING, a machine-readable directory of datasets for agricultureThe new CIARD RING, a machine-readable directory of datasets for agriculture
The new CIARD RING , a machine-readable directory of datasets for agriculture
 
Interoperability is the key: repositories networks promoting the quality and ...
Interoperability is the key: repositories networks promoting the quality and ...Interoperability is the key: repositories networks promoting the quality and ...
Interoperability is the key: repositories networks promoting the quality and ...
 
RDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataRDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival data
 
STI Summit 2011 - Linked data-services-streams
STI Summit 2011 - Linked data-services-streamsSTI Summit 2011 - Linked data-services-streams
STI Summit 2011 - Linked data-services-streams
 
The CIARD RING , a global directory of datasets for agriculture, by Valeria P...
The CIARD RING, a global directory of datasets for agriculture, by Valeria P...The CIARD RING, a global directory of datasets for agriculture, by Valeria P...
The CIARD RING , a global directory of datasets for agriculture, by Valeria P...
 
Tim Pugh-SPEDDEXES 2014
Tim Pugh-SPEDDEXES 2014Tim Pugh-SPEDDEXES 2014
Tim Pugh-SPEDDEXES 2014
 
Data Integration And Visualization
Data Integration And VisualizationData Integration And Visualization
Data Integration And Visualization
 
Enterprise guide to building a Data Mesh
Enterprise guide to building a Data MeshEnterprise guide to building a Data Mesh
Enterprise guide to building a Data Mesh
 
Tag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh PlatformTag.bio: Self Service Data Mesh Platform
Tag.bio: Self Service Data Mesh Platform
 
Wed roman tut_open_datapub
Wed roman tut_open_datapubWed roman tut_open_datapub
Wed roman tut_open_datapub
 
apidays LIVE Paris 2021 - Stargate.io, An OSS Api Layer for your Cassandra by...
apidays LIVE Paris 2021 - Stargate.io, An OSS Api Layer for your Cassandra by...apidays LIVE Paris 2021 - Stargate.io, An OSS Api Layer for your Cassandra by...
apidays LIVE Paris 2021 - Stargate.io, An OSS Api Layer for your Cassandra by...
 
CPaaS.io Y1 Review Meeting - Holistic Data Management
CPaaS.io Y1 Review Meeting - Holistic Data ManagementCPaaS.io Y1 Review Meeting - Holistic Data Management
CPaaS.io Y1 Review Meeting - Holistic Data Management
 
Arabidopsis Information Portal overview from Plant Biology Europe 2014
Arabidopsis Information Portal overview from Plant Biology Europe 2014Arabidopsis Information Portal overview from Plant Biology Europe 2014
Arabidopsis Information Portal overview from Plant Biology Europe 2014
 
A BASILar Approach for Building Web APIs on top of SPARQL Endpoints
A BASILar Approach for Building Web APIs on top of SPARQL EndpointsA BASILar Approach for Building Web APIs on top of SPARQL Endpoints
A BASILar Approach for Building Web APIs on top of SPARQL Endpoints
 
FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout
 
Datasets and GATE Evaluation Framework for Benchmarking Wikipedia Based NER S...
Datasets and GATE Evaluation Framework for Benchmarking Wikipedia Based NER S...Datasets and GATE Evaluation Framework for Benchmarking Wikipedia Based NER S...
Datasets and GATE Evaluation Framework for Benchmarking Wikipedia Based NER S...
 
BDE SC3.3 Workshop - BDE Platform: Technical overview
 BDE SC3.3 Workshop -  BDE Platform: Technical overview BDE SC3.3 Workshop -  BDE Platform: Technical overview
BDE SC3.3 Workshop - BDE Platform: Technical overview
 
Open Services for Lifecycle Collaboration (OSLC) - Extending REST APIs to Con...
Open Services for Lifecycle Collaboration (OSLC) - Extending REST APIs to Con...Open Services for Lifecycle Collaboration (OSLC) - Extending REST APIs to Con...
Open Services for Lifecycle Collaboration (OSLC) - Extending REST APIs to Con...
 
RDF Database-as-a-Service with S4
RDF Database-as-a-Service with S4RDF Database-as-a-Service with S4
RDF Database-as-a-Service with S4
 

More from Alasdair Gray

Using a Jupyter Notebook to perform a reproducible scientific analysis over s...
Using a Jupyter Notebook to perform a reproducible scientific analysis over s...Using a Jupyter Notebook to perform a reproducible scientific analysis over s...
Using a Jupyter Notebook to perform a reproducible scientific analysis over s...
Alasdair Gray
 
Bioschemas Community: Developing profiles over Schema.org to make life scienc...
Bioschemas Community: Developing profiles over Schema.org to make life scienc...Bioschemas Community: Developing profiles over Schema.org to make life scienc...
Bioschemas Community: Developing profiles over Schema.org to make life scienc...
Alasdair Gray
 
An Identifier Scheme for the Digitising Scotland Project
An Identifier Scheme for the Digitising Scotland ProjectAn Identifier Scheme for the Digitising Scotland Project
An Identifier Scheme for the Digitising Scotland Project
Alasdair Gray
 
Supporting Dataset Descriptions in the Life Sciences
Supporting Dataset Descriptions in the Life SciencesSupporting Dataset Descriptions in the Life Sciences
Supporting Dataset Descriptions in the Life Sciences
Alasdair Gray
 
Validata: A tool for testing profile conformance
Validata: A tool for testing profile conformanceValidata: A tool for testing profile conformance
Validata: A tool for testing profile conformance
Alasdair Gray
 
Open PHACTS: The Data Today
Open PHACTS: The Data TodayOpen PHACTS: The Data Today
Open PHACTS: The Data Today
Alasdair Gray
 
Project X
Project XProject X
Project X
Alasdair Gray
 
Data Integration in a Big Data Context
Data Integration in a Big Data ContextData Integration in a Big Data Context
Data Integration in a Big Data Context
Alasdair Gray
 
Scientific lenses to support multiple views over linked chemistry data
Scientific lenses to support multiple views over linked chemistry dataScientific lenses to support multiple views over linked chemistry data
Scientific lenses to support multiple views over linked chemistry data
Alasdair Gray
 
Describing Scientific Datasets: The HCLS Community Profile
Describing Scientific Datasets: The HCLS Community ProfileDescribing Scientific Datasets: The HCLS Community Profile
Describing Scientific Datasets: The HCLS Community Profile
Alasdair Gray
 
Scientific Lenses over Linked Data: Identity Management in the Open PHACTS p...
Scientific Lenses over Linked Data: Identity Management in the Open PHACTS p...Scientific Lenses over Linked Data: Identity Management in the Open PHACTS p...
Scientific Lenses over Linked Data: Identity Management in the Open PHACTS p...Alasdair Gray
 
Computing Identity Co-Reference Across Drug Discovery Datasets
Computing Identity Co-Reference Across Drug Discovery DatasetsComputing Identity Co-Reference Across Drug Discovery Datasets
Computing Identity Co-Reference Across Drug Discovery Datasets
Alasdair Gray
 

More from Alasdair Gray (12)

Using a Jupyter Notebook to perform a reproducible scientific analysis over s...
Using a Jupyter Notebook to perform a reproducible scientific analysis over s...Using a Jupyter Notebook to perform a reproducible scientific analysis over s...
Using a Jupyter Notebook to perform a reproducible scientific analysis over s...
 
Bioschemas Community: Developing profiles over Schema.org to make life scienc...
Bioschemas Community: Developing profiles over Schema.org to make life scienc...Bioschemas Community: Developing profiles over Schema.org to make life scienc...
Bioschemas Community: Developing profiles over Schema.org to make life scienc...
 
An Identifier Scheme for the Digitising Scotland Project
An Identifier Scheme for the Digitising Scotland ProjectAn Identifier Scheme for the Digitising Scotland Project
An Identifier Scheme for the Digitising Scotland Project
 
Supporting Dataset Descriptions in the Life Sciences
Supporting Dataset Descriptions in the Life SciencesSupporting Dataset Descriptions in the Life Sciences
Supporting Dataset Descriptions in the Life Sciences
 
Validata: A tool for testing profile conformance
Validata: A tool for testing profile conformanceValidata: A tool for testing profile conformance
Validata: A tool for testing profile conformance
 
Open PHACTS: The Data Today
Open PHACTS: The Data TodayOpen PHACTS: The Data Today
Open PHACTS: The Data Today
 
Project X
Project XProject X
Project X
 
Data Integration in a Big Data Context
Data Integration in a Big Data ContextData Integration in a Big Data Context
Data Integration in a Big Data Context
 
Scientific lenses to support multiple views over linked chemistry data
Scientific lenses to support multiple views over linked chemistry dataScientific lenses to support multiple views over linked chemistry data
Scientific lenses to support multiple views over linked chemistry data
 
Describing Scientific Datasets: The HCLS Community Profile
Describing Scientific Datasets: The HCLS Community ProfileDescribing Scientific Datasets: The HCLS Community Profile
Describing Scientific Datasets: The HCLS Community Profile
 
Scientific Lenses over Linked Data: Identity Management in the Open PHACTS p...
Scientific Lenses over Linked Data: Identity Management in the Open PHACTS p...Scientific Lenses over Linked Data: Identity Management in the Open PHACTS p...
Scientific Lenses over Linked Data: Identity Management in the Open PHACTS p...
 
Computing Identity Co-Reference Across Drug Discovery Datasets
Computing Identity Co-Reference Across Drug Discovery DatasetsComputing Identity Co-Reference Across Drug Discovery Datasets
Computing Identity Co-Reference Across Drug Discovery Datasets
 

Recently uploaded

in vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptxin vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptx
yusufzako14
 
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCINGRNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
AADYARAJPANDEY1
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
muralinath2
 
Mammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also FunctionsMammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also Functions
YOGESH DOGRA
 
What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
moosaasad1975
 
Hemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptxHemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptx
muralinath2
 
Structures and textures of metamorphic rocks
Structures and textures of metamorphic rocksStructures and textures of metamorphic rocks
Structures and textures of metamorphic rocks
kumarmathi863
 
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
muralinath2
 
Leaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdfLeaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdf
RenuJangid3
 
Richard's entangled aventures in wonderland
Richard's entangled aventures in wonderlandRichard's entangled aventures in wonderland
Richard's entangled aventures in wonderland
Richard Gill
 
extra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdfextra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdf
DiyaBiswas10
 
ESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptxESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptx
muralinath2
 
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
NathanBaughman3
 
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
Sérgio Sacani
 
Citrus Greening Disease and its Management
Citrus Greening Disease and its ManagementCitrus Greening Disease and its Management
Citrus Greening Disease and its Management
subedisuryaofficial
 
Cancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate PathwayCancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate Pathway
AADYARAJPANDEY1
 
NuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final versionNuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final version
pablovgd
 
platelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptxplatelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptx
muralinath2
 
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdfSCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SELF-EXPLANATORY
 
EY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptxEY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptx
AlguinaldoKong
 

Recently uploaded (20)

in vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptxin vitro propagation of plants lecture note.pptx
in vitro propagation of plants lecture note.pptx
 
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCINGRNA INTERFERENCE: UNRAVELING GENETIC SILENCING
RNA INTERFERENCE: UNRAVELING GENETIC SILENCING
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
 
Mammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also FunctionsMammalian Pineal Body Structure and Also Functions
Mammalian Pineal Body Structure and Also Functions
 
What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
 
Hemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptxHemoglobin metabolism_pathophysiology.pptx
Hemoglobin metabolism_pathophysiology.pptx
 
Structures and textures of metamorphic rocks
Structures and textures of metamorphic rocksStructures and textures of metamorphic rocks
Structures and textures of metamorphic rocks
 
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
 
Leaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdfLeaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdf
 
Richard's entangled aventures in wonderland
Richard's entangled aventures in wonderlandRichard's entangled aventures in wonderland
Richard's entangled aventures in wonderland
 
extra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdfextra-chromosomal-inheritance[1].pptx.pdfpdf
extra-chromosomal-inheritance[1].pptx.pdfpdf
 
ESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptxESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptx
 
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
 
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.
 
Citrus Greening Disease and its Management
Citrus Greening Disease and its ManagementCitrus Greening Disease and its Management
Citrus Greening Disease and its Management
 
Cancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate PathwayCancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate Pathway
 
NuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final versionNuGOweek 2024 Ghent - programme - final version
NuGOweek 2024 Ghent - programme - final version
 
platelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptxplatelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptx
 
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdfSCHIZOPHRENIA Disorder/ Brain Disorder.pdf
SCHIZOPHRENIA Disorder/ Brain Disorder.pdf
 
EY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptxEY - Supply Chain Services 2018_template.pptx
EY - Supply Chain Services 2018_template.pptx
 

Dataset Descriptions in Open PHACTS and HCLS

  • 1. Dataset Descriptions in Open PHACTS and W3C HCLS IG Alasdair J G Gray Heriot-Watt University www.alasdairjggray.co.uk A.J.G.Gray@hw.ac.uk NDEx Call, April 2014
  • 2. Nanopub Db VoID Data Cache (Virtuoso Triple Store) Semantic Workflow Engine Linked Data API (RDF/XML, TTL, JSON) Domain Specific Services Identity Resolution Service Chemistry Registration Normalisation & Q/C Identifier Management Service Indexing CorePlatform P12374 EC2.43.4 CS4532 “Adenosine receptor 2a” VoID Db Nanopub Db VoID Db VoID Nanopub VoID Public Content Commercial Public Ontologies User Annotations Apps
  • 3. Data Cache (Triple Store) Semantic Workflow Engine Linked Data API (RDF/XML, TTL, JSON) Domain Specific Services Identity Resolution Service Identifier Management Service CorePlatform P12374 EC2.43.4 CS4532 “Adenosine receptor 2a” ChEMBL- RDF ChEMBL Apps Chem2Bio2 RDF SD v13v12 v2 or v8
  • 4.
  • 5. ChemSpider • Data aggregator: over 400 sources – What data does it contain? – What version of ?? did they load? – When are new versions loaded? • OPS data covers – ChEBI – ChEMBL – DrugBank 2 April 2014 OPS Dataset Descriptions – A. J. G. Gray 5
  • 6. Metadata Challenges • Datasets available – In many versions over time – In different formats – From many mirrors/registries • Datasets build on each other • Files do not carry metadata • Registries – Can be out-of-date – Can contain conflicting information 2 April 2014 OPS Dataset Descriptions – A. J. G. Gray 6 Users require data provenance!
  • 7. 2 April 2014 OPS Dataset Descriptions – A. J. G. Gray 7
  • 8. 2 April 2014 OPS Dataset Descriptions – A. J. G. Gray 8
  • 9. Description Model 2 April 2014 OPS Dataset Descriptions – A. J. G. Gray 9
  • 10. Realisation of Dataset Descriptions • Needs to be incorporated into data publishing pipeline • Hard for publishers to provide conformant descriptions – Datasets are complex – Evolve over time – Seen as yet another burden 2 April 2014 OPS Dataset Descriptions – A. J. G. Gray 15
  • 11. VoID Editor 2 April 2014 OPS Dataset Descriptions – A. J. G. Gray 16
  • 12. Validator 2 April 2014 OPS Dataset Descriptions – A. J. G. Gray 17
  • 14. HCLS Community Profile Model 2 April 2014 OPS Dataset Descriptions – A. J. G. Gray 19
  • 15. Future Vision Metadata: Write once, use many times • Provide rich and accurate provenance trail of data – Automatic pipeline from VoID file to registries • Align Open PHACTS with W3C HCLS – Update tools for HCLS profile 2 April 2014 OPS Dataset Descriptions – A. J. G. Gray 20

Editor's Notes

  1. Motivation from OPSChallengesOPS approachW3C HCLS work
  2. Reminder of current architecture
  3. ChemSpider: EBI SDF fileChEMBL 13Data Cache: Chem2Bio2RDF ChEMBL RDFFile downloaded May 2011Chem2Bio2RDF metadata webpages:ChEMBL 8File contents: ChEMBL 2Mapping Server: KasabiChEMBL RDF fileChEMBL 12
  4. Large number of datasets: differing update ratesdifferent characteristicsRequire automated process
  5. Specifies checklist of propertiesDrawers upon existing vocabulariesAims to be simple to use: extensive guidance notes
  6. Checklist and guidance notes – user friendlyMinimal, easy to follow modelDrawer upon existing vocabulariesRequired and optional properties
  7. Agent-entity-action model can be cumbersome for datasets; agent not always known beyond data provider, i.e. not individual.Extension requirement is by design
  8. Provide two tools to help
  9. Dataset description creatorGenerates outline description through web formAllows you to see generated content
  10. Given a dataset description, does it conform to the OPS guidelinesGenerates error (red) and warning (orange) reportsError for MUST propertiesWarning for SHOULD propertiesInformation for MAY properties
  11. Large community buy in – Including EBIBuilds on OPS document: Checklist and guidance notes!Wide range of use casesShould be finalised by end of May – not final URL
  12. Three tier model – More complexMore required properties (not shown)Richer metadata
  13. Open PHACTS: 28 partner9 Pharmaceuticals3 Biotechs1 Triplestore firm15 academic