Science is rapidly being brought into the electronic realm and electronic laboratory notebooks (ELN) are a big part of this activity. The representation of the scientific process in the context of an ELN is an important component to making the data recorded in ELNs semantically integrated.
This presentation outlined initial developments of an Electronic Notebook Ontology (ENO) that will help tie together the ExptML ontology, HCLS Community Profile data descriptions, and the VIVO-ISF ontology.
2. Motivation
Inspiration
Electronic Scientific Notebooks
The Experiment Markup Language
VIVO-ISF Ontology
HCLS Community Profiles
Analysis
Important Questions
Ontology
Conclusion
Outline
3. There’s something
missing from the big data landscape in science…
VIVO captures data about scientists (faculty)…
…but not about the data they produce
HCLS Community Profile outlines metadata for describing
datasets but does not mention laboratory notebooks
Electronic laboratory notebooks are set to become the
standard way scientists capture data
How do we link these together?
Motivation
4.
5. Scientists need to move to
digital notebooks…
...and record not just the data
but the flow and context
Traditional Laboratory Notebooks
How science is done
is important for searching,
aggregation, meta-analysis
6. Developed out of Laboratory Information
Management Systems (LIMS)
Content Management System for Scientists
Storage of
Research data
Research resources (instruments, samples, scientists)
The story of the scientific endeavor
Link to external resources
Display chemical structures
Allow aggregation, processing of data
Be compliant with industry standard record keeping
Electronic Laboratory Notebooks
8. A specification (written in XML) that describes
different types of information recorded during the
scientific process (http://exptml.sourceforge.net)
Many datatypes (will expand…)
Experiment Markup Language (ExptML)
Sample
Solution
Space
Specimen
Substance
Task
Template
Timeline
User
Vendor
Annotation
Api
Calculation
Chemical
Citation
Communication
Customer
Data
Dataset
Definition
Element
Equipment
Event
Experiment
Group
Project
Protocol
Quote
Report
Result
11. The Healthcare and Life Science (HCLS) Community Profile
is a Note from the Semantic Web HCLS Interest Group
Access to consistent, high-quality metadata is critical to finding,
understanding, and reusing scientific data. This document
describes a consensus among participating stakeholders in the
Health Care and the Life Sciences domain on the description of
datasets using the Resource Description Framework (RDF). This
specification meets key functional requirements, reuses existing
vocabularies to the extent that it is possible, and addresses
elements of data description, versioning, provenance,
discovery, exchange, query, and retrieval.
Data Descriptions:
HCLS Community Profile
http://www.w3.org/TR/hcls-dataset/
12. Describes three levels for
description of datasets
Summary Level
Type declaration (rdf:type =
dctypes:Dataset)
Title (dct:title = rdf:langString)
Description (dct:description =
rdf:langString)
Publisher (dct:publisher = IRI)
Version Level
Type declaration (rdf:type =
dctypes:Dataset)
Title (dct:title = rdf:langString)
Description (dct:description =
rdf:langString)
Creator (dct:creator = IRI)
Publisher (dct:publisher = IRI)
Version identifier (pav:version =
xsd:string)
Version linking (dct:isVersionOf = IRI)
Distribution Level
Type declaration (rdf:type =
void:Dataset OR dcat:Distribution)
Title (dct:title = rdf:langString)
Description (dct:description =
rdf:langString)
Creator (dct:creator = IRI)
Publisher (dct:publisher = IRI)
License (rdf:type = IRI)
Data Descriptions:
HCLS Community Profile
http://www.w3.org/TR/hcls-dataset/#datasetdescriptionlevels
13. Goal: Automated identification of datasets that could
be made searchable and/or distributable
When an ELN functions what does it do?
Orchestrates access to the system (authentication)
Supplies GUI to allow information to be
Displayed
Entered
Processed
Processes files to bring them into the system
Sends requests to internal/external servers to get data
Analysis
14. Is this information a dataset?
Does dataset belong to this author?
Is the dataset available?
Is there appropriate metadata?
At what HCLS levels can this dataset be made available?
What mechanism is used to make the dataset available?
Important Questions
15. Actions that deal with datasets
Software actions
User actions
Clues that something is research data
(not metadata or someone else’s data)
Collection of metadata for annotation of datasets
Inference that a HCLS dataset has been created
Dataset Identification
19. Providing a mechanism to link research data to VIVO
profiles would
Add value to VIVO
Provides faculty with a resource for their
data management plans
Creates opportunities for automatic aggregation
of research data into institutional repositories
Needs to be implemented in a test ELN…
Take Home