The Center for Expanded Data Annotation and Retrieval (CEDAR) has developed a suite of tools and services that allow scientists to create and publish metadata describing scientific experiments. Using these tools and services—referred to collectively as the CEDAR Workbench—scientists can collaboratively author metadata and submit them to public repositories. A key focus of our software is semantically enriching metadata with ontology terms. The system combines emerging technologies, such as JSON-LD and graph databases, with modern software development technologies, such as microservices and container platforms. The result is a suite of user-friendly, Web-based tools and REST APIs that provide a versatile end-to-end solution to the problems of metadata authoring and management. This talk presents the architecture of the CEDAR Workbench and focuses on the technology choices made to construct an easily usable, open system that allows users to create and publish semantically enriched metadata in standard Web formats.
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (SWAT4LS 2017 Conference)
1. Attila L. Egyedi, Martin O’Connor, Marcos Martínez-Romero, Debra Willrett,
Josef Hardi, John Graybeal and Mark Musen
Biomedical Informatics Research
Stanford University
Stanford, California, USA
Embracing Semantic Technology for
Better Metadata Authoring in Biomedicine
2. 2
What are metadata?
Data that describe data
Crucial for:
• Finding experimental datasets online
• Understanding how the experiments were performed
• Reusing the data to perform new analyses
4. age
Age
AGE
`Age
age (after birth)
age (in years)
age (y)
age (year)
age (years)
Age (years)
Age (Years)
age (yr)
age (yr-old)
age (yrs)
Age (yrs)
age [y]
age [year]
age [years]
age in years
age of patient
Age of patient
age of subjects
age(years)
Age(years)
Age(yrs.)
Age, year
age, years
age, yrs
age.year
age_years
4
Metadata quality is poor
5. Gonçalves, R. S. et al. (2017). Metadata in the BioSample Online Repository are Impaired by
Numerous Anomalies. SemSci 2017 Workshop, co-located with ISWC 2017. Vienna, Austria.
Value type Invalid % Example values
Boolean 73% nonsmoker, former-smoker
Integer 26% JM52, UVPgt59.4, pig
Ontology term 68% presumed normal, wild_type
An analysis of metadata from NCBI’s BioSample repository
5
Metadata quality is poor
7. 7
Our solution: CEDAR
• A web application for
metadata management
and submission
• Goal: Overcome the
impediments to creating
high-quality metadata
8. 8
CEDAR metadata pipeline
SUBMIT METADATAFILL IN METADATADESIGN TEMPLATE
Template Designer Metadata Editor
Template authors
(e.g., standards
committees)
Metadata authors
(e.g., scientists)
Metadata Repositorytemplate metadata
LINCS
Public Databases
https://cedar.metadatacenter.org/templates/edit/https://repo.metadatacenter.org/templates/ab105771-564e-42a1-9be4-5a63891… https://cedar.metadatacenter.org/instances/edit/https://repo.metadatacenter.org/template-instances/d4f1059e-8e27-4166-902f-…
A sample study
Acute stress disorder
Stanford University
John Doe
Longitudinal
9. 9
CEDAR System Architecture
Metadata
Repository
(MongoDB)
Folders,
Groups &
Permissions
(Neo4j DB)
Users
(MongoDB)
= Third-party components
= CEDAR components
Storage
user profiles user authorization
Templates Elements Fields Metadata
Template Model
User
Service
Template
Service
Value
Recommender
Service
Auth. Service
(Keycloak)
Open Services
user management resource management intelligent
authoring
controlled terms
Resource
Service
Workspace
Service
= Only internal access
Group
Service
Worker
Service
Queues &
Caching
(Redis)
Submission
Service
Metadata CreatorTemplate Designer Resource Manager
metadata
export
r1
r2
rn
…
Public Databases
NCBO BioPortal
Open Services
Terminology
Service
Messaging
Service
Messages
(MySQL)
Search Engine
(Elasticsearch)
messages
Template Designer Metadata EditorResource Manager
10. 10
CEDAR Template Model
O’Connor et al.: An open repository model for acquiring knowledge
about scientific experiments. Proceedings of the 20th International
Conference on Knowledge Engineering and Knowledge Management
(EKAW2016), 2016.
JSON Schema + JSON-LD JSON-LD
TemplatesElementsFields Metadata
Storage
Open Services
Front End
Template
Model
12. 12
Infrastructure Layer
Folders,
Groups &
Permissions
Metadata
Repository
Study 2
metadata
Study 1
metadata
BioSample
template
isBasedOn
isBasedOn
Bob
Everybody
/
Users
CONTAINS
CEDAR
Admin
CONTAINS
CEDAR
Admin
OWNS
MEMBEROF
MEMBEROF
ADMINISTERS
OWNS
OWNS
BioSample
Study 1 Study 2
CONTAINSCONTAINS
OWNS
OWNS
CANREAD
OWNS
Bob
CONTAINS
OWNS
CONTAINS
OWNS Studies
CONTAINS
Group
User
Folder
Metadata
Template
• Folders
• Permissions
• Sharing
Storage
Open Services
Front End
Template
Model
14. 14
Services Layer
• Resource Service – core metadata repository service
• Terminology Service – ontology repositories
• Value Recommender Service – metadata recommendations
• Submission Service – submission to public repositories
db1
db2
dbn
…
Storage
Open Services
Front End
Template
Model
20. 20
Metadata Editor
• Fill in
• Validate
Martínez-Romero, M. et al.: Fast and accurate metadata authoring
using ontology-based recommendations. Proceedings of AMIA 2017
Annual Symposium, 2017.
Storage
Open Services
Front End
Template
Model
24. 24
Summary
• Authoring metadata is hard and time-consuming
• Authoring semantic metadata is even harder
• The CEDAR Workbench provides a pipeline for
creating high quality, semantically rich metadata
Template Model
JSON Schema
JSON-LD
Neo4j
Microservices
Docker
Key technology choices: