Your SlideShare is downloading. ×
0
Dataset Descriptions in Open PHACTS and HCLS
Dataset Descriptions in Open PHACTS and HCLS
Dataset Descriptions in Open PHACTS and HCLS
Dataset Descriptions in Open PHACTS and HCLS
Dataset Descriptions in Open PHACTS and HCLS
Dataset Descriptions in Open PHACTS and HCLS
Dataset Descriptions in Open PHACTS and HCLS
Dataset Descriptions in Open PHACTS and HCLS
Dataset Descriptions in Open PHACTS and HCLS
Dataset Descriptions in Open PHACTS and HCLS
Dataset Descriptions in Open PHACTS and HCLS
Dataset Descriptions in Open PHACTS and HCLS
Dataset Descriptions in Open PHACTS and HCLS
Dataset Descriptions in Open PHACTS and HCLS
Dataset Descriptions in Open PHACTS and HCLS
Dataset Descriptions in Open PHACTS and HCLS
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Dataset Descriptions in Open PHACTS and HCLS

154

Published on

This presentation gives an overview of the dataset description specification developed in the Open PHACTS project (http://www.openphacts.org/). The creation of the specification was driven by a real …

This presentation gives an overview of the dataset description specification developed in the Open PHACTS project (http://www.openphacts.org/). The creation of the specification was driven by a real need within the project to track the datasets used.

Details of the dataset metadata captured and the vocabularies used to model this metadata are given together with the tools developed to enable the specification's uptake.

Over the course of the last 12 months, the W3C Healthcare and Life Science Interest Group have been developing a community profile for dataset descriptions. This has drawn on the ideas developed in the Open PHACTS specification. A brief overview of the forthcoming community profile is given in the presentation.

This presentation was given to the Network Data Exchange project http://www.ndexbio.org/ on 2 April 2014.

Published in: Science
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
154
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
2
Comments
0
Likes
2
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Motivation from OPSChallengesOPS approachW3C HCLS work
  • Reminder of current architecture
  • ChemSpider: EBI SDF fileChEMBL 13Data Cache: Chem2Bio2RDF ChEMBL RDFFile downloaded May 2011Chem2Bio2RDF metadata webpages:ChEMBL 8File contents: ChEMBL 2Mapping Server: KasabiChEMBL RDF fileChEMBL 12
  • Large number of datasets: differing update ratesdifferent characteristicsRequire automated process
  • Specifies checklist of propertiesDrawers upon existing vocabulariesAims to be simple to use: extensive guidance notes
  • Checklist and guidance notes – user friendlyMinimal, easy to follow modelDrawer upon existing vocabulariesRequired and optional properties
  • Agent-entity-action model can be cumbersome for datasets; agent not always known beyond data provider, i.e. not individual.Extension requirement is by design
  • Provide two tools to help
  • Dataset description creatorGenerates outline description through web formAllows you to see generated content
  • Given a dataset description, does it conform to the OPS guidelinesGenerates error (red) and warning (orange) reportsError for MUST propertiesWarning for SHOULD propertiesInformation for MAY properties
  • Large community buy in – Including EBIBuilds on OPS document: Checklist and guidance notes!Wide range of use casesShould be finalised by end of May – not final URL
  • Three tier model – More complexMore required properties (not shown)Richer metadata
  • Open PHACTS: 28 partner9 Pharmaceuticals3 Biotechs1 Triplestore firm15 academic
  • Transcript

    • 1. Dataset Descriptions in Open PHACTS and W3C HCLS IG Alasdair J G Gray Heriot-Watt University www.alasdairjggray.co.uk A.J.G.Gray@hw.ac.uk NDEx Call, April 2014
    • 2. Nanopub Db VoID Data Cache (Virtuoso Triple Store) Semantic Workflow Engine Linked Data API (RDF/XML, TTL, JSON) Domain Specific Services Identity Resolution Service Chemistry Registration Normalisation & Q/C Identifier Management Service Indexing CorePlatform P12374 EC2.43.4 CS4532 “Adenosine receptor 2a” VoID Db Nanopub Db VoID Db VoID Nanopub VoID Public Content Commercial Public Ontologies User Annotations Apps
    • 3. Data Cache (Triple Store) Semantic Workflow Engine Linked Data API (RDF/XML, TTL, JSON) Domain Specific Services Identity Resolution Service Identifier Management Service CorePlatform P12374 EC2.43.4 CS4532 “Adenosine receptor 2a” ChEMBL- RDF ChEMBL Apps Chem2Bio2 RDF SD v13v12 v2 or v8
    • 4. ChemSpider • Data aggregator: over 400 sources – What data does it contain? – What version of ?? did they load? – When are new versions loaded? • OPS data covers – ChEBI – ChEMBL – DrugBank 2 April 2014 OPS Dataset Descriptions – A. J. G. Gray 5
    • 5. Metadata Challenges • Datasets available – In many versions over time – In different formats – From many mirrors/registries • Datasets build on each other • Files do not carry metadata • Registries – Can be out-of-date – Can contain conflicting information 2 April 2014 OPS Dataset Descriptions – A. J. G. Gray 6 Users require data provenance!
    • 6. 2 April 2014 OPS Dataset Descriptions – A. J. G. Gray 7
    • 7. 2 April 2014 OPS Dataset Descriptions – A. J. G. Gray 8
    • 8. Description Model 2 April 2014 OPS Dataset Descriptions – A. J. G. Gray 9
    • 9. Realisation of Dataset Descriptions • Needs to be incorporated into data publishing pipeline • Hard for publishers to provide conformant descriptions – Datasets are complex – Evolve over time – Seen as yet another burden 2 April 2014 OPS Dataset Descriptions – A. J. G. Gray 15
    • 10. VoID Editor 2 April 2014 OPS Dataset Descriptions – A. J. G. Gray 16
    • 11. Validator 2 April 2014 OPS Dataset Descriptions – A. J. G. Gray 17
    • 12. W3C HCLS Group
    • 13. HCLS Community Profile Model 2 April 2014 OPS Dataset Descriptions – A. J. G. Gray 19
    • 14. Future Vision Metadata: Write once, use many times • Provide rich and accurate provenance trail of data – Automatic pipeline from VoID file to registries • Align Open PHACTS with W3C HCLS – Update tools for HCLS profile 2 April 2014 OPS Dataset Descriptions – A. J. G. Gray 20
    • 15. A.J.G.Gray@hw.ac.uk www.alasdairjggray.co.uk www.openphacts.org

    ×