Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
HOW OPEN IS OPEN?
AN EVALUATION RUBRIC
FOR PUBLIC
KNOWLEDGEBASES
MELISSA HAENDEL
MARCH 28TH, 2017
@ontowonka
THERE ARE OVER 1500 PUBLIC
DATABASES IN NUCLEIC ACIDS
RESEARCH DATABASE COLLECTION
https://doi.org/10.1093/nar/gkw1188
HOW MANY OF THESE ARE TRULY OPEN?
OPENNESS IS AN NAR
REQUIREMENT, BUT …
WHY ARE WE STILL FAILING?
OPEN DATA IS FAIR DATA
http://www.nature.com/articles
/sdata201618
Findable Accessible Interoperable Reusable
ANATOMY OF FAIR:
FINDABLE
 persistent identifier
 rich metadata
 registered or indexed in a searchable resource
McMurry...
ANATOMY OF FAIR:
ACCESSIBLE
 (meta) data are openly retrievable by their
identifier using a standardized
communications p...
ANATOMY OF FAIR:
INTEROPERABLE
 Use a formal, accessible, shared, and broadly
applicable language for knowledge
represent...
ANATOMY OF FAIR:
INTEROPERABLE
Picking on the Personal Genome Project (thanks Sasha!)
Do you have a severe genetic disease...
ANATOMY OF FAIR:
REUSABLE
 Meta(data) are described with a plurality of
accurate and relevant attributes
 Detailed prove...
A RUBRIC FOR EVALUATION
bit.ly/eval-rfi
Findable Accessible Interoperable Reusable
FAIR-TLC
Traceable Licensed Connected
FAIR-TLC:
TRACEABILITY
 Provenance is documented and attributed
 Contributions to the content (data, tools,
algorithms, ...
FAIR-TLC: LICENSURE
http://peterdesmet.com/posts/analyzing-gbif-data-licenses.html
Not all data resources are free to use,...
FAIR-TLC: LICENSURE
http://peterdesmet.com/posts/analyzing-gbif-data-licenses.html
Standar
d
license
171
Non-
standar
d
li...
NON-STANDARD LICENSES
BURDEN SCIENCE bit.ly/reusabledata-forum
FAIR-TLC: CONNECTED
BECAUSE AGGREGATED != INTEGRATED
FAIR-TLC: CONNECTED
BECAUSE AGGREGATED != INTEGRATED
192K datasets….probably more than 38 are relevant to diabetes
FAIR-TLC: CONNECTED
BECAUSE AGGREGATED != INTEGRATED
Similarly, clouds do not integrate data.
http://stonebond.com/wp-cont...
EVALUATING THE OPEN
SCIENCE CANDIDATES Room for
improvement
bit.ly/open-science-priz
Open imaging
DISCUSSION:
HOW DO WE DO BETTER?
Make the right thing the easy thing:
- Carrots:
- Tenure & promotion cycles
- Dedicated f...
ARE JOURNAL DATA SHARING
POLICIES HITTING THE MARK ?
Vasilevsky et al.
https://doi.org/10.7287/peerj.preprints.2588v1
TOO TINY A STICK?
Vasilevsky et al.
https://doi.org/10.7287/peerj.preprints.2588v1
REUSABLEDATA.ORG
Curate, evaluate, and provide guidance on
legal and effective data reuse and redistrubiton
Wanna help? Jo...
THANKS TO:
JULIE MCMURRY
ANDREW SU
SETH CARBON
Upcoming SlideShare
Loading in …5
×

0

Share

Download to read offline

How open is open? An evaluation rubric for public knowledgebases

Download to read offline

Presented at the 2017 International Biocuration Conference.
Data relevant to any given scientific investigation is highly decentralized across thousands of specialized databases. Within the Biocuration community, we recognize that the value of open scientific knowledge bases is that they make scientific knowledge easier to find and compute, thereby maximizing impact and minimizing waste. The ever-increasing number of databases makes us necessarily question what are our priorities with respect to maintaining them, developing new ones, or senescing/subsuming ones that have completed in their mission. Therefore, open biomedical data repositories should be carefully evaluated according to quality, accessibility, and value of the database resources over time and across the translational divide.

Traditional citation count and publication impact factors as a measure of success or value are known to be inadequate to assess the usefulness of a resource. This is especially true for integrative resources. For example, almost everyone in biomedicine relies on PubMed, but almost no one ever cites or mentions it in their publications. While the Nucleic Acids Research Database issues have increased citation of some databases, many still go unpublished or uncited; even novel derivations of methodology, applications, and workflows from biomedical knowledge bases are often “adapted” but never cited. There is a lack of citation best practices for widely used biomedical database resources (e.g. should a paper be cited? A URL? Is mention of the name and access date sufficient?).

We have developed a draft evaluation rubric for evaluating open science databases according to the commonly cited FAIR principles -- Findable, Accessible, Interoperable, and Reusable, but with three additional principles: Traceable, Licensed, and Connected. These additions are largely overlooked and underappreciated, yet are critical to reuse of the knowledge contained within any given database. It is worth noting that FAIR principles apply not only to the resource as a whole, but also to their key components; this “fractal FAIRness” means that even the license, identifiers, vocabularies, APIs themselves must be Findable, Accessible, Interoperable, Reusable, etc. Here we report on initial testing of our evaluation rubric on the recent NIH/Wellcome Trust Open Science projects and seek community input for how to further advance this rubric as a Biocuration community resource.

Related Audiobooks

Free with a 30 day trial from Scribd

See all
  • Be the first to like this

How open is open? An evaluation rubric for public knowledgebases

  1. 1. HOW OPEN IS OPEN? AN EVALUATION RUBRIC FOR PUBLIC KNOWLEDGEBASES MELISSA HAENDEL MARCH 28TH, 2017 @ontowonka
  2. 2. THERE ARE OVER 1500 PUBLIC DATABASES IN NUCLEIC ACIDS RESEARCH DATABASE COLLECTION https://doi.org/10.1093/nar/gkw1188
  3. 3. HOW MANY OF THESE ARE TRULY OPEN? OPENNESS IS AN NAR REQUIREMENT, BUT …
  4. 4. WHY ARE WE STILL FAILING?
  5. 5. OPEN DATA IS FAIR DATA http://www.nature.com/articles /sdata201618 Findable Accessible Interoperable Reusable
  6. 6. ANATOMY OF FAIR: FINDABLE  persistent identifier  rich metadata  registered or indexed in a searchable resource McMurry et al Identifiers for the 21st century bit.ly/identifiers-2017
  7. 7. ANATOMY OF FAIR: ACCESSIBLE  (meta) data are openly retrievable by their identifier using a standardized communications protocol  Metadata are accessible, even when the data are no longer available http://api.monarchinitiative.org/api/
  8. 8. ANATOMY OF FAIR: INTEROPERABLE  Use a formal, accessible, shared, and broadly applicable language for knowledge representation  Define semantics of all relationships, including cross references (hint: use the Relations Ontology!)
  9. 9. ANATOMY OF FAIR: INTEROPERABLE Picking on the Personal Genome Project (thanks Sasha!) Do you have a severe genetic disease or rare genetic trait? If so, you can add a description for your public profile. 1. Extreme susceptibility to motion sickness. - answers pertain to this trait 2. Pyloric stenosis 3. Unusually small feet for my height
  10. 10. ANATOMY OF FAIR: REUSABLE  Meta(data) are described with a plurality of accurate and relevant attributes  Detailed provenance and use of community standards www.obofoundry.org https://www.w3.org/TR/hcls-dataset/ https://peerj.com/articles/2331.pdf
  11. 11. A RUBRIC FOR EVALUATION bit.ly/eval-rfi
  12. 12. Findable Accessible Interoperable Reusable FAIR-TLC Traceable Licensed Connected
  13. 13. FAIR-TLC: TRACEABILITY  Provenance is documented and attributed  Contributions to the content (data, tools, algorithms, sources, etc.) are declared  Documentation on how to cite a record from a source or the whole resource
  14. 14. FAIR-TLC: LICENSURE http://peterdesmet.com/posts/analyzing-gbif-data-licenses.html Not all data resources are free to use, derive, and redistribute, even if they are publicly funded and seemingly publicly available.
  15. 15. FAIR-TLC: LICENSURE http://peterdesmet.com/posts/analyzing-gbif-data-licenses.html Standar d license 171 Non- standar d license 1069 No license 10734
  16. 16. NON-STANDARD LICENSES BURDEN SCIENCE bit.ly/reusabledata-forum
  17. 17. FAIR-TLC: CONNECTED BECAUSE AGGREGATED != INTEGRATED
  18. 18. FAIR-TLC: CONNECTED BECAUSE AGGREGATED != INTEGRATED 192K datasets….probably more than 38 are relevant to diabetes
  19. 19. FAIR-TLC: CONNECTED BECAUSE AGGREGATED != INTEGRATED Similarly, clouds do not integrate data. http://stonebond.com/wp-content/uploads/2015/05/cloud-data-bullet-points-img.jpg
  20. 20. EVALUATING THE OPEN SCIENCE CANDIDATES Room for improvement bit.ly/open-science-priz Open imaging
  21. 21. DISCUSSION: HOW DO WE DO BETTER? Make the right thing the easy thing: - Carrots: - Tenure & promotion cycles - Dedicated funding for increasing FAIR- TLC - Sticks: - Publication requirements - Funding requirements - Tools: - Tracking tools - Documentation tools
  22. 22. ARE JOURNAL DATA SHARING POLICIES HITTING THE MARK ? Vasilevsky et al. https://doi.org/10.7287/peerj.preprints.2588v1
  23. 23. TOO TINY A STICK? Vasilevsky et al. https://doi.org/10.7287/peerj.preprints.2588v1
  24. 24. REUSABLEDATA.ORG Curate, evaluate, and provide guidance on legal and effective data reuse and redistrubiton Wanna help? Join the google group at: Seth Carbonbit.ly/reusabledata-forum
  25. 25. THANKS TO: JULIE MCMURRY ANDREW SU SETH CARBON

Presented at the 2017 International Biocuration Conference. Data relevant to any given scientific investigation is highly decentralized across thousands of specialized databases. Within the Biocuration community, we recognize that the value of open scientific knowledge bases is that they make scientific knowledge easier to find and compute, thereby maximizing impact and minimizing waste. The ever-increasing number of databases makes us necessarily question what are our priorities with respect to maintaining them, developing new ones, or senescing/subsuming ones that have completed in their mission. Therefore, open biomedical data repositories should be carefully evaluated according to quality, accessibility, and value of the database resources over time and across the translational divide. Traditional citation count and publication impact factors as a measure of success or value are known to be inadequate to assess the usefulness of a resource. This is especially true for integrative resources. For example, almost everyone in biomedicine relies on PubMed, but almost no one ever cites or mentions it in their publications. While the Nucleic Acids Research Database issues have increased citation of some databases, many still go unpublished or uncited; even novel derivations of methodology, applications, and workflows from biomedical knowledge bases are often “adapted” but never cited. There is a lack of citation best practices for widely used biomedical database resources (e.g. should a paper be cited? A URL? Is mention of the name and access date sufficient?). We have developed a draft evaluation rubric for evaluating open science databases according to the commonly cited FAIR principles -- Findable, Accessible, Interoperable, and Reusable, but with three additional principles: Traceable, Licensed, and Connected. These additions are largely overlooked and underappreciated, yet are critical to reuse of the knowledge contained within any given database. It is worth noting that FAIR principles apply not only to the resource as a whole, but also to their key components; this “fractal FAIRness” means that even the license, identifiers, vocabularies, APIs themselves must be Findable, Accessible, Interoperable, Reusable, etc. Here we report on initial testing of our evaluation rubric on the recent NIH/Wellcome Trust Open Science projects and seek community input for how to further advance this rubric as a Biocuration community resource.

Views

Total views

1,306

On Slideshare

0

From embeds

0

Number of embeds

13

Actions

Downloads

19

Shares

0

Comments

0

Likes

0

×