SlideShare a Scribd company logo
Validata: A tool for testing
profile conformance
Alasdair J G Gray
Heriot-Watt University
www.macs.hw.ac.uk/~ajg33
A.J.G.Gray@hw.ac.uk
@gray_alasdair
Andrew Beveridge
Jacob Baungard Hansen
Johnny Val
Leif Gehrmann
Roisin Farmer
Sunil Khutan
Tomas Robertson
HCLS Dataset Descriptions
https://www.w3.org/TR/hcls-dataset/
Dumontier M, Gray AJG, Marshall MS, et al. (2016) The health care
and life sciences community profile for dataset descriptions.
PeerJ 4:e2331 https://doi.org/10.7717/peerj.2331
1 December 2016
@gray_alasdair
www.macs.hw.ac.uk/~ajg33
2
Requirements
• Online tool
– Deployable on W3C
server
– GUI
– API
• Support multiple
constraints
– Properties
– Data values
– …
• Requirement levels
– Different levels of
user messages:
Error, Warning,
Information
• Configurable
– HCLS (Required)
– DCAT, Open
PHACTS, etc
(Optional)
1 December 2016
@gray_alasdair
www.macs.hw.ac.uk/~ajg33
3
Example Constraint
1 December 2016 4
• Shape
• A Dataset
– MUST be declared to be of type dctype:Dataset
– MUST have a dcterms:title as a language typed
string
– MUST NOT have dcterms:created date
<Dataset> rdf:langString
.
✗
@gray_alasdair
www.macs.hw.ac.uk/~ajg33
Dates are associated
with versions in HCLS
Example Validation
1 December 2016 5
<Dataset> rdf:langString
.
✗
@gray_alasdair
www.macs.hw.ac.uk/~ajg33
• Shape
• Data
Example Validation
• Shape
• Data
1 December 2016 6
<Dataset> rdf:langString
.
✗
@gray_alasdair
www.macs.hw.ac.uk/~ajg33
Example Validation
1 December 2016 7
<Dataset> rdf:langString
.
✗
@gray_alasdair
www.macs.hw.ac.uk/~ajg33
• Shape
• Data
<Dataset> {
rdf:type (dctypes:Dataset),
dct:title rdf:langString,
dct:alternative rdf:langString+,
!dct:created .
}
Shape
1 December 2016 8
<Dataset> rdf:langString
.
✗
@gray_alasdair
www.macs.hw.ac.uk/~ajg33
Shape Expressions (ShEx)
1 December 2016 9
@gray_alasdair
www.macs.hw.ac.uk/~ajg33
ShEx: Validation
<Dataset> {
rdf:type (dctypes:Dataset),
dct:title rdf:langString,
dct:alternative rdf:langString+,
!dct:created .
}
<Dataset> {
rdf:type (dctypes:Dataset),
dct:title rdf:langString,
dct:alternative rdf:langString+,
!dct:created .
}
<Dataset> {
rdf:type (dctypes:Dataset),
dct:title rdf:langString,
dct:alternative rdf:langString+,
!dct:created .
}
<Dataset> {
rdf:type (dctypes:Dataset),
dct:title rdf:langString,
dct:alternative rdf:langString+,
!dct:created .
}
<Dataset> {
rdf:type (dctypes:Dataset),
dct:title rdf:langString,
dct:alternative rdf:langString+,
!dct:created .
}
<Dataset> {
rdf:type (dctypes:Dataset),
dct:title rdf:langString,
dct:alternative rdf:langString+,
!dct:created .
}
Validator can’t warn of
missing property
Example data
<Dataset> {
`MUST` rdf:type (dctypes:Dataset),
`MUST` dct:title rdf:langString,
`MAY` dct:alternative rdf:langString+,
`MUST` !dct:created .
}
Shape
1 December 2016 10
<Dataset> rdf:langString
.
✗
@gray_alasdair
www.macs.hw.ac.uk/~ajg33
Requirement Levels
Validator can warn of
missing property
Implementation
Validata
• Web app front end
• Javascript + HTML
• Relies on ShEx-validator
– Validates documents
– Returns report
https://github.com/HW-
SWeL/Validata
ShEx-validator
• Validation system
• Validation API
• Javascript
– nodejs engine
• Reuses
– n3: RDF Library
– ShExParser
https://github.com/HW-
SWeL/ShEx-validator
1 December 2016
@gray_alasdair
www.macs.hw.ac.uk/~ajg33
11
http://hw-swel.github.io/Validata/
VALIDATA DEMO
Validata
https://github.com/HW-SWeL/Validata
• RDF constraint validation tool
– Configurable to any profile
• Shape Expression (ShEx) constraints
• Open source javascript implementation
www.macs.hw.ac.uk/~ajg33/
A.J.G.Gray@hw.ac.uk
@gray_alasdair

More Related Content

What's hot

Data curation at Dryad Digital Repository: A former curator's perspective
Data curation at Dryad Digital Repository: A former curator's perspectiveData curation at Dryad Digital Repository: A former curator's perspective
Data curation at Dryad Digital Repository: A former curator's perspective
Jane Frazier
 
Citations needed for the sum of all human knowledge: Wikidata as the missing ...
Citations needed for the sum of all human knowledge: Wikidata as the missing ...Citations needed for the sum of all human knowledge: Wikidata as the missing ...
Citations needed for the sum of all human knowledge: Wikidata as the missing ...
Dario Taraborelli
 
Data Repositories Impact
Data Repositories ImpactData Repositories Impact
Data Repositories Impact
Merce Crosas
 
Verifiable, linked open knowledge that anyone can edit
Verifiable, linked open knowledge that anyone can editVerifiable, linked open knowledge that anyone can edit
Verifiable, linked open knowledge that anyone can edit
Dario Taraborelli
 
Open Source Tools Facilitating Sharing/Protecting Privacy: Dataverse and Data...
Open Source Tools Facilitating Sharing/Protecting Privacy: Dataverse and Data...Open Source Tools Facilitating Sharing/Protecting Privacy: Dataverse and Data...
Open Source Tools Facilitating Sharing/Protecting Privacy: Dataverse and Data...
Merce Crosas
 
Using Funding Data
Using Funding DataUsing Funding Data
Using Funding Data
Crossref
 
Yosemite part-4 webinar-final
Yosemite part-4 webinar-finalYosemite part-4 webinar-final
Yosemite part-4 webinar-final
DATAVERSITY
 
Introduction-and-RDF-Representation-of-FHIR-for-Clinical-Data
Introduction-and-RDF-Representation-of-FHIR-for-Clinical-DataIntroduction-and-RDF-Representation-of-FHIR-for-Clinical-Data
Introduction-and-RDF-Representation-of-FHIR-for-Clinical-Data
DATAVERSITY
 
Connecting Dataverse with the Research Life Cycle
Connecting Dataverse with the Research Life CycleConnecting Dataverse with the Research Life Cycle
Connecting Dataverse with the Research Life Cycle
Merce Crosas
 
DataTags, The Tags Toolset, and Dataverse Integration
DataTags, The Tags Toolset, and Dataverse IntegrationDataTags, The Tags Toolset, and Dataverse Integration
DataTags, The Tags Toolset, and Dataverse Integration
Michael Bar-Sinai
 
Yosemite Project - Part 3 - Transformations for Integrating VA data with FHIR...
Yosemite Project - Part 3 - Transformations for Integrating VA data with FHIR...Yosemite Project - Part 3 - Transformations for Integrating VA data with FHIR...
Yosemite Project - Part 3 - Transformations for Integrating VA data with FHIR...
DATAVERSITY
 
ORCID: An Overview - Alice Meadows
ORCID: An Overview - Alice MeadowsORCID: An Overview - Alice Meadows
ORCID: An Overview - Alice Meadows
Crossref
 
FundRef Webinar
FundRef WebinarFundRef Webinar
FundRef Webinar
Crossref
 
BibBase Linked Data Triplification Challenge 2010 Presentation
BibBase Linked Data Triplification Challenge 2010 PresentationBibBase Linked Data Triplification Challenge 2010 Presentation
BibBase Linked Data Triplification Challenge 2010 Presentation
Reynold Xin
 
Wikidata and the Semantic Web of Food
Wikidata and the  Semantic Web of FoodWikidata and the  Semantic Web of Food
Wikidata and the Semantic Web of Food
Benjamin Good
 
Who is using your metadata - Ginny Hendricks
Who is using your metadata - Ginny HendricksWho is using your metadata - Ginny Hendricks
Who is using your metadata - Ginny Hendricks
Crossref
 
Wikidata: Verifiable, Linked Open Knowledge That Anyone Can Edit
Wikidata: Verifiable, Linked Open Knowledge That Anyone Can EditWikidata: Verifiable, Linked Open Knowledge That Anyone Can Edit
Wikidata: Verifiable, Linked Open Knowledge That Anyone Can Edit
Dario Taraborelli
 
ReVeaLD: A user-driven domain-specific interactive search platform for biomed...
ReVeaLD: A user-driven domain-specific interactive search platform for biomed...ReVeaLD: A user-driven domain-specific interactive search platform for biomed...
ReVeaLD: A user-driven domain-specific interactive search platform for biomed...Maulik Kamdar
 
Creating Incentives
Creating IncentivesCreating Incentives
Creating Incentives
datacite
 
Open semantic chemical structures
Open semantic chemical structuresOpen semantic chemical structures
Open semantic chemical structures
Stuart Chalk
 

What's hot (20)

Data curation at Dryad Digital Repository: A former curator's perspective
Data curation at Dryad Digital Repository: A former curator's perspectiveData curation at Dryad Digital Repository: A former curator's perspective
Data curation at Dryad Digital Repository: A former curator's perspective
 
Citations needed for the sum of all human knowledge: Wikidata as the missing ...
Citations needed for the sum of all human knowledge: Wikidata as the missing ...Citations needed for the sum of all human knowledge: Wikidata as the missing ...
Citations needed for the sum of all human knowledge: Wikidata as the missing ...
 
Data Repositories Impact
Data Repositories ImpactData Repositories Impact
Data Repositories Impact
 
Verifiable, linked open knowledge that anyone can edit
Verifiable, linked open knowledge that anyone can editVerifiable, linked open knowledge that anyone can edit
Verifiable, linked open knowledge that anyone can edit
 
Open Source Tools Facilitating Sharing/Protecting Privacy: Dataverse and Data...
Open Source Tools Facilitating Sharing/Protecting Privacy: Dataverse and Data...Open Source Tools Facilitating Sharing/Protecting Privacy: Dataverse and Data...
Open Source Tools Facilitating Sharing/Protecting Privacy: Dataverse and Data...
 
Using Funding Data
Using Funding DataUsing Funding Data
Using Funding Data
 
Yosemite part-4 webinar-final
Yosemite part-4 webinar-finalYosemite part-4 webinar-final
Yosemite part-4 webinar-final
 
Introduction-and-RDF-Representation-of-FHIR-for-Clinical-Data
Introduction-and-RDF-Representation-of-FHIR-for-Clinical-DataIntroduction-and-RDF-Representation-of-FHIR-for-Clinical-Data
Introduction-and-RDF-Representation-of-FHIR-for-Clinical-Data
 
Connecting Dataverse with the Research Life Cycle
Connecting Dataverse with the Research Life CycleConnecting Dataverse with the Research Life Cycle
Connecting Dataverse with the Research Life Cycle
 
DataTags, The Tags Toolset, and Dataverse Integration
DataTags, The Tags Toolset, and Dataverse IntegrationDataTags, The Tags Toolset, and Dataverse Integration
DataTags, The Tags Toolset, and Dataverse Integration
 
Yosemite Project - Part 3 - Transformations for Integrating VA data with FHIR...
Yosemite Project - Part 3 - Transformations for Integrating VA data with FHIR...Yosemite Project - Part 3 - Transformations for Integrating VA data with FHIR...
Yosemite Project - Part 3 - Transformations for Integrating VA data with FHIR...
 
ORCID: An Overview - Alice Meadows
ORCID: An Overview - Alice MeadowsORCID: An Overview - Alice Meadows
ORCID: An Overview - Alice Meadows
 
FundRef Webinar
FundRef WebinarFundRef Webinar
FundRef Webinar
 
BibBase Linked Data Triplification Challenge 2010 Presentation
BibBase Linked Data Triplification Challenge 2010 PresentationBibBase Linked Data Triplification Challenge 2010 Presentation
BibBase Linked Data Triplification Challenge 2010 Presentation
 
Wikidata and the Semantic Web of Food
Wikidata and the  Semantic Web of FoodWikidata and the  Semantic Web of Food
Wikidata and the Semantic Web of Food
 
Who is using your metadata - Ginny Hendricks
Who is using your metadata - Ginny HendricksWho is using your metadata - Ginny Hendricks
Who is using your metadata - Ginny Hendricks
 
Wikidata: Verifiable, Linked Open Knowledge That Anyone Can Edit
Wikidata: Verifiable, Linked Open Knowledge That Anyone Can EditWikidata: Verifiable, Linked Open Knowledge That Anyone Can Edit
Wikidata: Verifiable, Linked Open Knowledge That Anyone Can Edit
 
ReVeaLD: A user-driven domain-specific interactive search platform for biomed...
ReVeaLD: A user-driven domain-specific interactive search platform for biomed...ReVeaLD: A user-driven domain-specific interactive search platform for biomed...
ReVeaLD: A user-driven domain-specific interactive search platform for biomed...
 
Creating Incentives
Creating IncentivesCreating Incentives
Creating Incentives
 
Open semantic chemical structures
Open semantic chemical structuresOpen semantic chemical structures
Open semantic chemical structures
 

Similar to Validata: A tool for testing profile conformance

Being FAIR: Enabling Reproducible Data Science
Being FAIR: Enabling Reproducible Data ScienceBeing FAIR: Enabling Reproducible Data Science
Being FAIR: Enabling Reproducible Data Science
Carole Goble
 
Howe et al. - 2015 - BioAssay Research Database (BARD) chemical biolog
Howe et al. - 2015 - BioAssay Research Database (BARD) chemical biologHowe et al. - 2015 - BioAssay Research Database (BARD) chemical biolog
Howe et al. - 2015 - BioAssay Research Database (BARD) chemical biologEleanor Howe
 
Nicole Nogoy's talk at eResearchNZ 2014: Improving data sharing, integration ...
Nicole Nogoy's talk at eResearchNZ 2014: Improving data sharing, integration ...Nicole Nogoy's talk at eResearchNZ 2014: Improving data sharing, integration ...
Nicole Nogoy's talk at eResearchNZ 2014: Improving data sharing, integration ...
GigaScience, BGI Hong Kong
 
Dataverse on the MOC
Dataverse on the MOCDataverse on the MOC
Dataverse on the MOC
Merce Crosas
 
Scott Edmunds: Data Dissemination: Difficulties, Data Citation, DOI's (and Gi...
Scott Edmunds: Data Dissemination: Difficulties, Data Citation, DOI's (and Gi...Scott Edmunds: Data Dissemination: Difficulties, Data Citation, DOI's (and Gi...
Scott Edmunds: Data Dissemination: Difficulties, Data Citation, DOI's (and Gi...
Scott Edmunds
 
Scott Edmunds: Data Dissemination: Difficulties, Data Citation, DOI's (and Gi...
Scott Edmunds: Data Dissemination: Difficulties, Data Citation, DOI's (and Gi...Scott Edmunds: Data Dissemination: Difficulties, Data Citation, DOI's (and Gi...
Scott Edmunds: Data Dissemination: Difficulties, Data Citation, DOI's (and Gi...
GigaScience, BGI Hong Kong
 
National Data Archive (NADA) 3.0
National Data Archive (NADA) 3.0National Data Archive (NADA) 3.0
National Data Archive (NADA) 3.0
mehmood78
 
NIH Data Commons Architecture Ideas
NIH Data Commons Architecture IdeasNIH Data Commons Architecture Ideas
NIH Data Commons Architecture Ideas
Ian Foster
 
Team Argon Summary
Team Argon SummaryTeam Argon Summary
Team Argon Summary
Ian Foster
 
Google Scholar's citation graph: comprehensive, global... and inaccessible
Google Scholar's citation graph: comprehensive, global... and inaccessibleGoogle Scholar's citation graph: comprehensive, global... and inaccessible
Google Scholar's citation graph: comprehensive, global... and inaccessible
albertomartin101
 
Data Publishing at Harvard's Research Data Access Symposium
Data Publishing at Harvard's Research Data Access SymposiumData Publishing at Harvard's Research Data Access Symposium
Data Publishing at Harvard's Research Data Access Symposium
Merce Crosas
 
CEDAR: Web-Based Tools for Accelerating the Creation of Standardized Metadata
CEDAR: Web-Based Tools for Accelerating the Creation of Standardized MetadataCEDAR: Web-Based Tools for Accelerating the Creation of Standardized Metadata
CEDAR: Web-Based Tools for Accelerating the Creation of Standardized Metadata
Ahmad C. Bukhari
 
CEDAR: Web-Based Tools for Accelerating the Creation of Standardized Metadata
CEDAR: Web-Based Tools for Accelerating the Creation of Standardized MetadataCEDAR: Web-Based Tools for Accelerating the Creation of Standardized Metadata
CEDAR: Web-Based Tools for Accelerating the Creation of Standardized Metadata
Syed Ahmad Chan Bukhari, PhD
 
Building Protected Data Sharing Networks to Advance Cancer Risk Assessment an...
Building Protected Data Sharing Networks to Advance Cancer Risk Assessment an...Building Protected Data Sharing Networks to Advance Cancer Risk Assessment an...
Building Protected Data Sharing Networks to Advance Cancer Risk Assessment an...
Mary Bass
 
Dataset Sources Repositories.pptx
Dataset Sources Repositories.pptxDataset Sources Repositories.pptx
Dataset Sources Repositories.pptx
mantatheralyasriy
 
Crediting informatics and data folks in life science teams
Crediting informatics and data folks in life science teamsCrediting informatics and data folks in life science teams
Crediting informatics and data folks in life science teams
Carole Goble
 
Dataset Sources Repositories.pptx
Dataset Sources Repositories.pptxDataset Sources Repositories.pptx
Dataset Sources Repositories.pptx
mantatheralyasriy
 
Dataset Sources Repositories.pptx
Dataset Sources Repositories.pptxDataset Sources Repositories.pptx
Dataset Sources Repositories.pptx
mantatheralyasriy
 
Data Standards & Best Practices for the Stratigraphic Record
Data Standards & Best Practices for the Stratigraphic RecordData Standards & Best Practices for the Stratigraphic Record
Data Standards & Best Practices for the Stratigraphic Record
Kerstin Lehnert
 
Data Repositories: Recommendation, Certification and Models for Cost Recovery
Data Repositories: Recommendation, Certification and Models for Cost RecoveryData Repositories: Recommendation, Certification and Models for Cost Recovery
Data Repositories: Recommendation, Certification and Models for Cost Recovery
Anita de Waard
 

Similar to Validata: A tool for testing profile conformance (20)

Being FAIR: Enabling Reproducible Data Science
Being FAIR: Enabling Reproducible Data ScienceBeing FAIR: Enabling Reproducible Data Science
Being FAIR: Enabling Reproducible Data Science
 
Howe et al. - 2015 - BioAssay Research Database (BARD) chemical biolog
Howe et al. - 2015 - BioAssay Research Database (BARD) chemical biologHowe et al. - 2015 - BioAssay Research Database (BARD) chemical biolog
Howe et al. - 2015 - BioAssay Research Database (BARD) chemical biolog
 
Nicole Nogoy's talk at eResearchNZ 2014: Improving data sharing, integration ...
Nicole Nogoy's talk at eResearchNZ 2014: Improving data sharing, integration ...Nicole Nogoy's talk at eResearchNZ 2014: Improving data sharing, integration ...
Nicole Nogoy's talk at eResearchNZ 2014: Improving data sharing, integration ...
 
Dataverse on the MOC
Dataverse on the MOCDataverse on the MOC
Dataverse on the MOC
 
Scott Edmunds: Data Dissemination: Difficulties, Data Citation, DOI's (and Gi...
Scott Edmunds: Data Dissemination: Difficulties, Data Citation, DOI's (and Gi...Scott Edmunds: Data Dissemination: Difficulties, Data Citation, DOI's (and Gi...
Scott Edmunds: Data Dissemination: Difficulties, Data Citation, DOI's (and Gi...
 
Scott Edmunds: Data Dissemination: Difficulties, Data Citation, DOI's (and Gi...
Scott Edmunds: Data Dissemination: Difficulties, Data Citation, DOI's (and Gi...Scott Edmunds: Data Dissemination: Difficulties, Data Citation, DOI's (and Gi...
Scott Edmunds: Data Dissemination: Difficulties, Data Citation, DOI's (and Gi...
 
National Data Archive (NADA) 3.0
National Data Archive (NADA) 3.0National Data Archive (NADA) 3.0
National Data Archive (NADA) 3.0
 
NIH Data Commons Architecture Ideas
NIH Data Commons Architecture IdeasNIH Data Commons Architecture Ideas
NIH Data Commons Architecture Ideas
 
Team Argon Summary
Team Argon SummaryTeam Argon Summary
Team Argon Summary
 
Google Scholar's citation graph: comprehensive, global... and inaccessible
Google Scholar's citation graph: comprehensive, global... and inaccessibleGoogle Scholar's citation graph: comprehensive, global... and inaccessible
Google Scholar's citation graph: comprehensive, global... and inaccessible
 
Data Publishing at Harvard's Research Data Access Symposium
Data Publishing at Harvard's Research Data Access SymposiumData Publishing at Harvard's Research Data Access Symposium
Data Publishing at Harvard's Research Data Access Symposium
 
CEDAR: Web-Based Tools for Accelerating the Creation of Standardized Metadata
CEDAR: Web-Based Tools for Accelerating the Creation of Standardized MetadataCEDAR: Web-Based Tools for Accelerating the Creation of Standardized Metadata
CEDAR: Web-Based Tools for Accelerating the Creation of Standardized Metadata
 
CEDAR: Web-Based Tools for Accelerating the Creation of Standardized Metadata
CEDAR: Web-Based Tools for Accelerating the Creation of Standardized MetadataCEDAR: Web-Based Tools for Accelerating the Creation of Standardized Metadata
CEDAR: Web-Based Tools for Accelerating the Creation of Standardized Metadata
 
Building Protected Data Sharing Networks to Advance Cancer Risk Assessment an...
Building Protected Data Sharing Networks to Advance Cancer Risk Assessment an...Building Protected Data Sharing Networks to Advance Cancer Risk Assessment an...
Building Protected Data Sharing Networks to Advance Cancer Risk Assessment an...
 
Dataset Sources Repositories.pptx
Dataset Sources Repositories.pptxDataset Sources Repositories.pptx
Dataset Sources Repositories.pptx
 
Crediting informatics and data folks in life science teams
Crediting informatics and data folks in life science teamsCrediting informatics and data folks in life science teams
Crediting informatics and data folks in life science teams
 
Dataset Sources Repositories.pptx
Dataset Sources Repositories.pptxDataset Sources Repositories.pptx
Dataset Sources Repositories.pptx
 
Dataset Sources Repositories.pptx
Dataset Sources Repositories.pptxDataset Sources Repositories.pptx
Dataset Sources Repositories.pptx
 
Data Standards & Best Practices for the Stratigraphic Record
Data Standards & Best Practices for the Stratigraphic RecordData Standards & Best Practices for the Stratigraphic Record
Data Standards & Best Practices for the Stratigraphic Record
 
Data Repositories: Recommendation, Certification and Models for Cost Recovery
Data Repositories: Recommendation, Certification and Models for Cost RecoveryData Repositories: Recommendation, Certification and Models for Cost Recovery
Data Repositories: Recommendation, Certification and Models for Cost Recovery
 

More from Alasdair Gray

Using a Jupyter Notebook to perform a reproducible scientific analysis over s...
Using a Jupyter Notebook to perform a reproducible scientific analysis over s...Using a Jupyter Notebook to perform a reproducible scientific analysis over s...
Using a Jupyter Notebook to perform a reproducible scientific analysis over s...
Alasdair Gray
 
Bioschemas Community: Developing profiles over Schema.org to make life scienc...
Bioschemas Community: Developing profiles over Schema.org to make life scienc...Bioschemas Community: Developing profiles over Schema.org to make life scienc...
Bioschemas Community: Developing profiles over Schema.org to make life scienc...
Alasdair Gray
 
Open PHACTS: The Data Today
Open PHACTS: The Data TodayOpen PHACTS: The Data Today
Open PHACTS: The Data Today
Alasdair Gray
 
Project X
Project XProject X
Project X
Alasdair Gray
 
Data Integration in a Big Data Context: An Open PHACTS Case Study
Data Integration in a Big Data Context: An Open PHACTS Case StudyData Integration in a Big Data Context: An Open PHACTS Case Study
Data Integration in a Big Data Context: An Open PHACTS Case Study
Alasdair Gray
 
Data Integration in a Big Data Context
Data Integration in a Big Data ContextData Integration in a Big Data Context
Data Integration in a Big Data Context
Alasdair Gray
 
Data Linkage
Data LinkageData Linkage
Data Linkage
Alasdair Gray
 
Scientific lenses to support multiple views over linked chemistry data
Scientific lenses to support multiple views over linked chemistry dataScientific lenses to support multiple views over linked chemistry data
Scientific lenses to support multiple views over linked chemistry data
Alasdair Gray
 
Scientific Lenses over Linked Data An approach to support multiple integrate...
Scientific Lenses over Linked Data An approach to support multiple integrate...Scientific Lenses over Linked Data An approach to support multiple integrate...
Scientific Lenses over Linked Data An approach to support multiple integrate...
Alasdair Gray
 
Describing Scientific Datasets: The HCLS Community Profile
Describing Scientific Datasets: The HCLS Community ProfileDescribing Scientific Datasets: The HCLS Community Profile
Describing Scientific Datasets: The HCLS Community Profile
Alasdair Gray
 
SensorBench
SensorBenchSensorBench
SensorBench
Alasdair Gray
 
Data Science meets Linked Data
Data Science meets Linked DataData Science meets Linked Data
Data Science meets Linked Data
Alasdair Gray
 
Sensors and Big Data for Health and Well-being
Sensors and Big Data for Health and Well-beingSensors and Big Data for Health and Well-being
Sensors and Big Data for Health and Well-beingAlasdair Gray
 
Scientific Lenses over Linked Data: Identity Management in the Open PHACTS p...
Scientific Lenses over Linked Data: Identity Management in the Open PHACTS p...Scientific Lenses over Linked Data: Identity Management in the Open PHACTS p...
Scientific Lenses over Linked Data: Identity Management in the Open PHACTS p...Alasdair Gray
 
Dataset Descriptions in Open PHACTS and HCLS
Dataset Descriptions in Open PHACTS and HCLSDataset Descriptions in Open PHACTS and HCLS
Dataset Descriptions in Open PHACTS and HCLS
Alasdair Gray
 
Computing Identity Co-Reference Across Drug Discovery Datasets
Computing Identity Co-Reference Across Drug Discovery DatasetsComputing Identity Co-Reference Across Drug Discovery Datasets
Computing Identity Co-Reference Across Drug Discovery Datasets
Alasdair Gray
 
Incorporating Commercial and Private Data into an Open Linked Data Platform f...
Incorporating Commercial and Private Data into an Open Linked Data Platform f...Incorporating Commercial and Private Data into an Open Linked Data Platform f...
Incorporating Commercial and Private Data into an Open Linked Data Platform f...
Alasdair Gray
 
Including Co-Referent URIs in a SPARQL Query
Including Co-Referent URIs in a SPARQL QueryIncluding Co-Referent URIs in a SPARQL Query
Including Co-Referent URIs in a SPARQL Query
Alasdair Gray
 
2013 01-14 ops-dataset_descriptions
2013 01-14 ops-dataset_descriptions2013 01-14 ops-dataset_descriptions
2013 01-14 ops-dataset_descriptions
Alasdair Gray
 

More from Alasdair Gray (19)

Using a Jupyter Notebook to perform a reproducible scientific analysis over s...
Using a Jupyter Notebook to perform a reproducible scientific analysis over s...Using a Jupyter Notebook to perform a reproducible scientific analysis over s...
Using a Jupyter Notebook to perform a reproducible scientific analysis over s...
 
Bioschemas Community: Developing profiles over Schema.org to make life scienc...
Bioschemas Community: Developing profiles over Schema.org to make life scienc...Bioschemas Community: Developing profiles over Schema.org to make life scienc...
Bioschemas Community: Developing profiles over Schema.org to make life scienc...
 
Open PHACTS: The Data Today
Open PHACTS: The Data TodayOpen PHACTS: The Data Today
Open PHACTS: The Data Today
 
Project X
Project XProject X
Project X
 
Data Integration in a Big Data Context: An Open PHACTS Case Study
Data Integration in a Big Data Context: An Open PHACTS Case StudyData Integration in a Big Data Context: An Open PHACTS Case Study
Data Integration in a Big Data Context: An Open PHACTS Case Study
 
Data Integration in a Big Data Context
Data Integration in a Big Data ContextData Integration in a Big Data Context
Data Integration in a Big Data Context
 
Data Linkage
Data LinkageData Linkage
Data Linkage
 
Scientific lenses to support multiple views over linked chemistry data
Scientific lenses to support multiple views over linked chemistry dataScientific lenses to support multiple views over linked chemistry data
Scientific lenses to support multiple views over linked chemistry data
 
Scientific Lenses over Linked Data An approach to support multiple integrate...
Scientific Lenses over Linked Data An approach to support multiple integrate...Scientific Lenses over Linked Data An approach to support multiple integrate...
Scientific Lenses over Linked Data An approach to support multiple integrate...
 
Describing Scientific Datasets: The HCLS Community Profile
Describing Scientific Datasets: The HCLS Community ProfileDescribing Scientific Datasets: The HCLS Community Profile
Describing Scientific Datasets: The HCLS Community Profile
 
SensorBench
SensorBenchSensorBench
SensorBench
 
Data Science meets Linked Data
Data Science meets Linked DataData Science meets Linked Data
Data Science meets Linked Data
 
Sensors and Big Data for Health and Well-being
Sensors and Big Data for Health and Well-beingSensors and Big Data for Health and Well-being
Sensors and Big Data for Health and Well-being
 
Scientific Lenses over Linked Data: Identity Management in the Open PHACTS p...
Scientific Lenses over Linked Data: Identity Management in the Open PHACTS p...Scientific Lenses over Linked Data: Identity Management in the Open PHACTS p...
Scientific Lenses over Linked Data: Identity Management in the Open PHACTS p...
 
Dataset Descriptions in Open PHACTS and HCLS
Dataset Descriptions in Open PHACTS and HCLSDataset Descriptions in Open PHACTS and HCLS
Dataset Descriptions in Open PHACTS and HCLS
 
Computing Identity Co-Reference Across Drug Discovery Datasets
Computing Identity Co-Reference Across Drug Discovery DatasetsComputing Identity Co-Reference Across Drug Discovery Datasets
Computing Identity Co-Reference Across Drug Discovery Datasets
 
Incorporating Commercial and Private Data into an Open Linked Data Platform f...
Incorporating Commercial and Private Data into an Open Linked Data Platform f...Incorporating Commercial and Private Data into an Open Linked Data Platform f...
Incorporating Commercial and Private Data into an Open Linked Data Platform f...
 
Including Co-Referent URIs in a SPARQL Query
Including Co-Referent URIs in a SPARQL QueryIncluding Co-Referent URIs in a SPARQL Query
Including Co-Referent URIs in a SPARQL Query
 
2013 01-14 ops-dataset_descriptions
2013 01-14 ops-dataset_descriptions2013 01-14 ops-dataset_descriptions
2013 01-14 ops-dataset_descriptions
 

Recently uploaded

Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 

Recently uploaded (20)

Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 

Validata: A tool for testing profile conformance

  • 1. Validata: A tool for testing profile conformance Alasdair J G Gray Heriot-Watt University www.macs.hw.ac.uk/~ajg33 A.J.G.Gray@hw.ac.uk @gray_alasdair Andrew Beveridge Jacob Baungard Hansen Johnny Val Leif Gehrmann Roisin Farmer Sunil Khutan Tomas Robertson
  • 2. HCLS Dataset Descriptions https://www.w3.org/TR/hcls-dataset/ Dumontier M, Gray AJG, Marshall MS, et al. (2016) The health care and life sciences community profile for dataset descriptions. PeerJ 4:e2331 https://doi.org/10.7717/peerj.2331 1 December 2016 @gray_alasdair www.macs.hw.ac.uk/~ajg33 2
  • 3. Requirements • Online tool – Deployable on W3C server – GUI – API • Support multiple constraints – Properties – Data values – … • Requirement levels – Different levels of user messages: Error, Warning, Information • Configurable – HCLS (Required) – DCAT, Open PHACTS, etc (Optional) 1 December 2016 @gray_alasdair www.macs.hw.ac.uk/~ajg33 3
  • 4. Example Constraint 1 December 2016 4 • Shape • A Dataset – MUST be declared to be of type dctype:Dataset – MUST have a dcterms:title as a language typed string – MUST NOT have dcterms:created date <Dataset> rdf:langString . ✗ @gray_alasdair www.macs.hw.ac.uk/~ajg33 Dates are associated with versions in HCLS
  • 5. Example Validation 1 December 2016 5 <Dataset> rdf:langString . ✗ @gray_alasdair www.macs.hw.ac.uk/~ajg33 • Shape • Data
  • 6. Example Validation • Shape • Data 1 December 2016 6 <Dataset> rdf:langString . ✗ @gray_alasdair www.macs.hw.ac.uk/~ajg33
  • 7. Example Validation 1 December 2016 7 <Dataset> rdf:langString . ✗ @gray_alasdair www.macs.hw.ac.uk/~ajg33 • Shape • Data
  • 8. <Dataset> { rdf:type (dctypes:Dataset), dct:title rdf:langString, dct:alternative rdf:langString+, !dct:created . } Shape 1 December 2016 8 <Dataset> rdf:langString . ✗ @gray_alasdair www.macs.hw.ac.uk/~ajg33 Shape Expressions (ShEx)
  • 9. 1 December 2016 9 @gray_alasdair www.macs.hw.ac.uk/~ajg33 ShEx: Validation <Dataset> { rdf:type (dctypes:Dataset), dct:title rdf:langString, dct:alternative rdf:langString+, !dct:created . } <Dataset> { rdf:type (dctypes:Dataset), dct:title rdf:langString, dct:alternative rdf:langString+, !dct:created . } <Dataset> { rdf:type (dctypes:Dataset), dct:title rdf:langString, dct:alternative rdf:langString+, !dct:created . } <Dataset> { rdf:type (dctypes:Dataset), dct:title rdf:langString, dct:alternative rdf:langString+, !dct:created . } <Dataset> { rdf:type (dctypes:Dataset), dct:title rdf:langString, dct:alternative rdf:langString+, !dct:created . } <Dataset> { rdf:type (dctypes:Dataset), dct:title rdf:langString, dct:alternative rdf:langString+, !dct:created . } Validator can’t warn of missing property Example data
  • 10. <Dataset> { `MUST` rdf:type (dctypes:Dataset), `MUST` dct:title rdf:langString, `MAY` dct:alternative rdf:langString+, `MUST` !dct:created . } Shape 1 December 2016 10 <Dataset> rdf:langString . ✗ @gray_alasdair www.macs.hw.ac.uk/~ajg33 Requirement Levels Validator can warn of missing property
  • 11. Implementation Validata • Web app front end • Javascript + HTML • Relies on ShEx-validator – Validates documents – Returns report https://github.com/HW- SWeL/Validata ShEx-validator • Validation system • Validation API • Javascript – nodejs engine • Reuses – n3: RDF Library – ShExParser https://github.com/HW- SWeL/ShEx-validator 1 December 2016 @gray_alasdair www.macs.hw.ac.uk/~ajg33 11
  • 13. Validata https://github.com/HW-SWeL/Validata • RDF constraint validation tool – Configurable to any profile • Shape Expression (ShEx) constraints • Open source javascript implementation www.macs.hw.ac.uk/~ajg33/ A.J.G.Gray@hw.ac.uk @gray_alasdair

Editor's Notes

  1. Motivation: how do we check descriptions conform? Summary level: time unchanging information, e.g. name, description, publisher Version level: version specific information, e.g. version number, creator, etc Distribution level: file specific information, e.g. file location and format, number of triples 18 vocabularies: DCTerms, DCAT, VoID, FOAF, … 61 prescribed properties: MUST, SHOULD, MAY, MUST NOT for each level
  2. Link into data publishing pipeline via API Not tied to HCLS, only a motivation No existing tool meets these needs
  3. Constraints form a graph pattern that data must comply with
  4. How do we validate that our example data conforms to a certain shape Express expected shape as ShEx Toy example, what about for real
  5. How do we validate that our example data conforms to a certain shape Express expected shape as ShEx Toy example, what about for real
  6. How do we validate that our example data conforms to a certain shape Express expected shape as ShEx Toy example, what about for real
  7. ShEx: Concise notation regex based W3C SHACL not stable when work done ShEx is an implementation of SHACL with extra features
  8. Step through validation process
  9. Extended ShEx to allow arbitrary hierarchies Toy example, what about for real
  10. ShEx-validator has other dependencies too Minimist: arguments parser Promise: call backs Pegjs: parser generator Mocha: test driven development