HOW OPEN IS OPEN?
AN EVALUATION RUBRIC
FOR PUBLIC
KNOWLEDGEBASES
MELISSA HAENDEL
MARCH 28TH, 2017
@ontowonka
THERE ARE OVER 1500 PUBLIC
DATABASES IN NUCLEIC ACIDS
RESEARCH DATABASE COLLECTION
https://doi.org/10.1093/nar/gkw1188
HOW MANY OF THESE ARE TRULY OPEN?
OPENNESS IS AN NAR
REQUIREMENT, BUT …
WHY ARE WE STILL FAILING?
OPEN DATA IS FAIR DATA
http://www.nature.com/articles
/sdata201618
Findable Accessible Interoperable Reusable
ANATOMY OF FAIR:
FINDABLE
 persistent identifier
 rich metadata
 registered or indexed in a searchable resource
McMurry et al Identifiers for the 21st century
bit.ly/identifiers-2017
ANATOMY OF FAIR:
ACCESSIBLE
 (meta) data are openly retrievable by their
identifier using a standardized
communications protocol
 Metadata are accessible, even when the data
are no longer available
http://api.monarchinitiative.org/api/
ANATOMY OF FAIR:
INTEROPERABLE
 Use a formal, accessible, shared, and broadly
applicable language for knowledge
representation
 Define semantics of all relationships, including
cross references (hint: use the Relations
Ontology!)
ANATOMY OF FAIR:
INTEROPERABLE
Picking on the Personal Genome Project (thanks Sasha!)
Do you have a severe genetic disease or rare genetic trait? If so, you can
add a description for your public profile.
1. Extreme susceptibility to motion sickness. - answers pertain to this trait
2. Pyloric stenosis
3. Unusually small feet for my height
ANATOMY OF FAIR:
REUSABLE
 Meta(data) are described with a plurality of
accurate and relevant attributes
 Detailed provenance and use of community
standards
www.obofoundry.org
https://www.w3.org/TR/hcls-dataset/
https://peerj.com/articles/2331.pdf
A RUBRIC FOR EVALUATION
bit.ly/eval-rfi
Findable Accessible Interoperable Reusable
FAIR-TLC
Traceable Licensed Connected
FAIR-TLC:
TRACEABILITY
 Provenance is documented and attributed
 Contributions to the content (data, tools,
algorithms, sources, etc.) are declared
 Documentation on how to cite a record from a
source or the whole resource
FAIR-TLC: LICENSURE
http://peterdesmet.com/posts/analyzing-gbif-data-licenses.html
Not all data resources are free to use, derive, and
redistribute, even if they are publicly funded and
seemingly publicly available.
FAIR-TLC: LICENSURE
http://peterdesmet.com/posts/analyzing-gbif-data-licenses.html
Standar
d
license
171
Non-
standar
d
license
1069
No
license
10734
NON-STANDARD LICENSES
BURDEN SCIENCE bit.ly/reusabledata-forum
FAIR-TLC: CONNECTED
BECAUSE AGGREGATED != INTEGRATED
FAIR-TLC: CONNECTED
BECAUSE AGGREGATED != INTEGRATED
192K datasets….probably more than 38 are relevant to diabetes
FAIR-TLC: CONNECTED
BECAUSE AGGREGATED != INTEGRATED
Similarly, clouds do not integrate data.
http://stonebond.com/wp-content/uploads/2015/05/cloud-data-bullet-points-img.jpg
EVALUATING THE OPEN
SCIENCE CANDIDATES Room for
improvement
bit.ly/open-science-priz
Open imaging
DISCUSSION:
HOW DO WE DO BETTER?
Make the right thing the easy thing:
- Carrots:
- Tenure & promotion cycles
- Dedicated funding for increasing FAIR-
TLC
- Sticks:
- Publication requirements
- Funding requirements
- Tools:
- Tracking tools
- Documentation tools
ARE JOURNAL DATA SHARING
POLICIES HITTING THE MARK ?
Vasilevsky et al.
https://doi.org/10.7287/peerj.preprints.2588v1
TOO TINY A STICK?
Vasilevsky et al.
https://doi.org/10.7287/peerj.preprints.2588v1
REUSABLEDATA.ORG
Curate, evaluate, and provide guidance on
legal and effective data reuse and redistrubiton
Wanna help? Join the google group at:
Seth Carbonbit.ly/reusabledata-forum
THANKS TO:
JULIE MCMURRY
ANDREW SU
SETH CARBON

How open is open? An evaluation rubric for public knowledgebases

Editor's Notes

  • #5 assets.inhabitat.com/wp-content/blogs.dir/1/files/2013/11/americas-failing-infrastructure-infographic-537x422.jpg
  • #7 Add zenodo link