Annotation as a
curation tool for
RRIDs
Anita Bandrowski
Force11: April 2016
Definitions
• RRIDs are a group of unique persistent identifiers, used by authors
during the publication process, mainly in neuroscience (obtained
from scicrunch.org/resources).
• SciBot is a simple application that hooks into hypothesis, gets text
from hypothesis and finds RRIDs in the text, then looks up the
information about the RRID from the SciCrunch resolver and feeds
the information to hypothes.is.
• SciCrunch resolver was created based on the work of the
Neuroscience Information Framework in indexing multiple databases,
aligning and structuring the data to make it compliant and it resolves
RRIDs.
RRID Curation task:
• Find all papers that used RRIDs
• Find papers that have enough info to auto-generate RRIDs*
• Determine if the author is using the correct identifier
• Curation tags for RRIDs or the lack thereof
• Put data into SciCrunch systems
• Can also release the data
• Using Hypothes.is; SciBot
What does curation as annotation look like?
Annotations consist of:
• Hypothesis: URL for the article page
• Hypothesis: Tagged text in the article
• SciBot: Any resolved or suspected RRIDs
• SciBot: Lookup of all resolved RRIDs including type links to database etc
• Curators: PubMed ID is always annotated, if it exists
• Curators: Paper is read, resources are found, RRIDs are assigned to each
section of text whether the paper section contains RRIDs or not
• Curators: RRIDCUR tags are attached to each RRID (was the author
correct?)
ComeseethisLive
Booth40
Where are we?
…about half way!
Once curated, data can be used by anyone
http://www.sciencedirect.com/science/article/pii/S0969996114002526
W3C Open Data for RRIDs
• We and Hypothes.is will make the complete RRID data set available in
the next few months, enabling us to possibly ask some new
questions.
• Are there questions you would ask of these data?
• Is so, which questions?
• If not, when would this become useful?

RIIb 2016a-force11

  • 1.
    Annotation as a curationtool for RRIDs Anita Bandrowski Force11: April 2016
  • 2.
    Definitions • RRIDs area group of unique persistent identifiers, used by authors during the publication process, mainly in neuroscience (obtained from scicrunch.org/resources). • SciBot is a simple application that hooks into hypothesis, gets text from hypothesis and finds RRIDs in the text, then looks up the information about the RRID from the SciCrunch resolver and feeds the information to hypothes.is. • SciCrunch resolver was created based on the work of the Neuroscience Information Framework in indexing multiple databases, aligning and structuring the data to make it compliant and it resolves RRIDs.
  • 3.
    RRID Curation task: •Find all papers that used RRIDs • Find papers that have enough info to auto-generate RRIDs* • Determine if the author is using the correct identifier • Curation tags for RRIDs or the lack thereof • Put data into SciCrunch systems • Can also release the data • Using Hypothes.is; SciBot
  • 5.
    What does curationas annotation look like?
  • 6.
    Annotations consist of: •Hypothesis: URL for the article page • Hypothesis: Tagged text in the article • SciBot: Any resolved or suspected RRIDs • SciBot: Lookup of all resolved RRIDs including type links to database etc • Curators: PubMed ID is always annotated, if it exists • Curators: Paper is read, resources are found, RRIDs are assigned to each section of text whether the paper section contains RRIDs or not • Curators: RRIDCUR tags are attached to each RRID (was the author correct?)
  • 7.
  • 8.
  • 9.
    Once curated, datacan be used by anyone http://www.sciencedirect.com/science/article/pii/S0969996114002526
  • 10.
    W3C Open Datafor RRIDs • We and Hypothes.is will make the complete RRID data set available in the next few months, enabling us to possibly ask some new questions. • Are there questions you would ask of these data? • Is so, which questions? • If not, when would this become useful?