BRIF Digital identifiers subgroup              Gudmundur A. Thorisson <gt50@leicester.ac.uk> GEN2PHEN / University of Leice...
BRIF and bio-resource identification        • The identification requirement: need to identify resources in          order t...
Monday, 22 October 12
BRIF and bio-resource identification        • The identification requirement: need to identify resources in          order t...
Digital identifiers - some background        • Definition: a digital identifier is a character string used to uniquely       ...
This work has received funding from the European Communitys          Seventh Framework Programme (FP7/2007-2013) under gra...
This work has received funding                                           under grant          agreement number 200754 BRIF...
Digital identifiers - some background        • Definition: a digital identifier is a character string used to uniquely       ...
Monday, 22 October 12
Identifier use cases in BRIF       • 3x broad categories of “stuff” to identify            i) Digital resources           R...
Datasets        • Definition: a data set (or dataset) is a collection of data, often presented in            tabular form b...
Data DOI scenario (simplified)        1. Research group registers a dataset and metadata in a suitable domain            re...
ORCID and DataCite Interoperability Network        • Persistent identifiers for connecting people and          dataset     ...
Databases        • Definition: an online database can be regarded as a collection of            data, but made accessible i...
BioDBCore - global catalogue of bio-db’s     • BioDBCore aims           – annotation - organize the bio-database          ...
Monday, 22 October 12
•[slot in Pierre] BRIF workshop, Toulouse Oct 22 2012Monday, 22 October 12
From	  Pa(ents	  to	  BioBanks	  and	  back…       • Persistent	  IDs	  for	  datasets	  &	  other	  digital	  resources  ...
Database	  Gateway	  	  &	  Computa1ons      User	  data                                        Imaging                   ...
Conclusions / next steps        • Complex landscape, lots of problems to tackle        • Key challenge will be to get auth...
Acknowledgements   GEN2PHEN Consortium                                                               This work has receive...
Upcoming SlideShare
Loading in...5
×

BRIF workshop Toulouse 2012 Digital IDs subgroup

231

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
231
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

BRIF workshop Toulouse 2012 Digital IDs subgroup

  1. 1. BRIF Digital identifiers subgroup Gudmundur A. Thorisson <gt50@leicester.ac.uk> GEN2PHEN / University of Leicester Pierre-Antoine Gourraud <pierreantoine.gourraud@ucsf.edu> UCSF -- Overview -- ‣Brief backgrounder on identification & digital identifiers ‣Use cases for bio-resource identification in BRIF ‣Digital resources: datasets, databases (Mummi) ‣Non-digital resources: projects, studies, cohorts [...] (Pierre) ‣Conclusions and next steps This work is published under the Creative Commons Attribution license (CC BY: http://creativecommons.org/licenses/by/3.0/) which means that it can be freely copied, redistributed and adapted, as long as proper attribution is given.Monday, 22 October 12
  2. 2. BRIF and bio-resource identification • The identification requirement: need to identify resources in order to – track use/reuse and impact – credit those who contribute to them • Biobanking projects have relied on: – Project/study/cohort names • Example: the GAZEL study in France >20 years http://www.gazel.inserm.fr • Challenges: - ad hoc agreements with research groups who reuse samples or data - painstaking manual searching through literature for mentions of ‘GAZEL‘ - project names are often ambiguous in global context BRIF workshop, Toulouse Oct 22 2012Monday, 22 October 12
  3. 3. Monday, 22 October 12
  4. 4. BRIF and bio-resource identification • The identification requirement: need to identify resources in order to – track use/reuse and impact – credit those who contribute to them • Example: biobanking projects frequently rely on... – Project/study/cohort names • Example: the GAZEL study in France >20 years http://www.gazel.inserm.fr • Challenges: - ad hoc agreements with research groups who reuse samples or data - painstaking manual searching through literature for mentions of ‘GAZEL‘ - project names are often ambiguous in global context – Citations to journal publications • Which paper to cite? Tricky to keep track of which citations are relevant to impact • Also troublesome if there is no paper to cite (e.g. for a new study) BRIF workshop, Toulouse Oct 22 2012Monday, 22 October 12
  5. 5. Digital identifiers - some background • Definition: a digital identifier is a character string used to uniquely identify i) a digital object in a computer system, or ii) a record in a computer system which describes a non-digital object • Persistence - once assigned, identifier MUST NOT change • Uniqueness - global scope vs local scope – Most ID schemes require tacid knowledge of the type of identifier to interpret • Example: EC grant identifiers in acknowledgement statements BRIF workshop, Toulouse Oct 22 2012Monday, 22 October 12
  6. 6. This work has received funding from the European Communitys Seventh Framework Programme (FP7/2007-2013) under grant agreement number 200754 - the GEN2PHEN project. BRIF workshop, Toulouse Oct 22 2012Monday, 22 October 12
  7. 7. This work has received funding under grant agreement number 200754 BRIF workshop, Toulouse Oct 22 2012Monday, 22 October 12
  8. 8. Digital identifiers - some background • Definition: a digital identifier is a character string used to uniquely identify i) a digital object in a computer system, or ii) a record in a computer system which describes a non-digital object • Persistence - once assigned, identifier MUST NOT change • Uniqueness - global scope vs local scope – Most ID schemes require tacid knowledge of the type of identifier to interpret • Example: EC grant identifiers • Some problem domains require for globally unique IDs – Example: ISBN numbers to identify books, e.g. for copyright purposes • Some problem domains require resolvable IDs – Resolve = retrieve out information about the thing being identified, including where to access it (for a digital object, its location on the Internet) – Digital Object IDs best known, but several other systems exist BRIF workshop, Toulouse Oct 22 2012Monday, 22 October 12
  9. 9. Monday, 22 October 12
  10. 10. Identifier use cases in BRIF • 3x broad categories of “stuff” to identify i) Digital resources Resources that actually “lives” in computers (born-digital or digitized content): datasets and databases ii) Physical resources Resources corresponding to actual physical things: samples, groups of samples, experimental instruments, etc. iii) Project-level and other “meta” resources Higher-level aggregates of things, projects, organizations, consortia etc. NB in many cases identifiers already exist for these things, but they are not exposed to the outside world in a usable form (i.e. made resolvable, citable, globally-unique). BRIF workshop, Toulouse Oct 22 2012Monday, 22 October 12
  11. 11. Datasets • Definition: a data set (or dataset) is a collection of data, often presented in tabular form but in the bio-sciences also frequently in a multitude of domain-specific formats, such as FASTA for biological sequences • Data publication and data citation is a hot topic - lots of research and infrastructure-building activity in recent years • Emerging best practices for data citation & attribution • Identifiers for dataset - persistent data DOIs issued via DataCite • Little new for BRIF to add here, except issue recommendations – KEY POINT: infrastructure for data preservation and access is a prerequisite for any sort of persistent bio-dataset identification scheme. Many projects don’t have this! BRIF workshop, Toulouse Oct 22 2012Monday, 22 October 12
  12. 12. Data DOI scenario (simplified) 1. Research group registers a dataset and metadata in a suitable domain repository (or their own repository) 2. Repository archives dataset and and assigns a DOI name to it 3. Unique DOI name is used by article authors (and others) to indicate resource reuse (ideally via formal data citation) 4. Journal article reference listings & full-text and other sources are mined to identify references to dataset and/or downloads 5. Dataset-level metrics calculated from collected data e.g. - total no. citations in scholarly articles - no. secondary citations (citations to papers which cited the original dataset) - no. downloads in the last 2 years BRIF workshop, Toulouse Oct 22 2012Monday, 22 October 12
  13. 13. ORCID and DataCite Interoperability Network • Persistent identifiers for connecting people and dataset • 2y EC-funded project, 7 partners in Europe + USA • Two main proof-of-concept pilots – Social Science data - use and citation of British Birth Cohort Studies • historical data, decades old, steadily being curated by lots of different people • high rate of reuse, often cited in papers – High-energy physics - attribution challenges • dealing with large no. authors on HEP papers - ‘dilution’ of the term authorship • Linking HEP papers to supporting datasets http://odin-project.eu/ BRIF workshop, Toulouse Oct 22 2012Monday, 22 October 12
  14. 14. Databases • Definition: an online database can be regarded as a collection of data, but made accessible in such a way that facilitates using the data to answer scientific question, via  structured querying and/or free-text searching of the data over the Internet • Broad range, from large-scale DNA and protein sequence repositories to small locus-specific databaess – E.g. GenBank, UniProt, GWAS Central, Ehlers-Danlos Syndrome Variant Database • Challenges in assessing impact & attributing curators – Reliance citations to database paper, if there is one (sometimes many) • Analyzing website traffic is another indicator - highly-accessed database =~ important – Database URLs sometimes change – Database name + URL often only mentioned only in materials&methods, no citation – Credit via authorship impossible if there is no database journal paper BRIF workshop, Toulouse Oct 22 2012Monday, 22 October 12
  15. 15. BioDBCore - global catalogue of bio-db’s • BioDBCore aims – annotation - organize the bio-database ‘resourceome’ – discovery - e.g. which protein sequence databases are available? • Who’s behind it? – International Society for Biocuration – Resource catalogues: Bioinformatics Links, BioSiteMaps, NAR db-issue etc – Working group includes reps from NAR and DATABASE journals, MIBBI, Model organism db’s, others • Catalogue will have persistent identifiers for each db entry http://www.biosharing.org/biodbcore BRIF workshop, Toulouse Oct 22 2012Monday, 22 October 12
  16. 16. Monday, 22 October 12
  17. 17. •[slot in Pierre] BRIF workshop, Toulouse Oct 22 2012Monday, 22 October 12
  18. 18. From  Pa(ents  to  BioBanks  and  back… • Persistent  IDs  for  datasets  &  other  digital  resources – Absolute  need • From  BioresourceResearchIF  to  BioresourceXIF – More  than  an  IP  address  ?   • Increase  need  of  iden<fica<on  for  source  of  informa<on   in  general   –  Not  only  research  purpose… – “Big  data”   – Quan<fied  self. • Blurring  the  border  between  :  Research,  data  (Non-­‐CLIA),     Clinically  approved  ,  consumer  centered  dataMonday, 22 October 12
  19. 19. Database  Gateway    &  Computa1ons User  data Imaging Reference   Front-­‐end   Individual  data groups  of  pa.ents tablet   Applica1onCopyright  ©  2012  The  Regents  of  University  California,  USA  -­‐  All  right  reserved.  Monday, 22 October 12
  20. 20. Conclusions / next steps • Complex landscape, lots of problems to tackle • Key challenge will be to get authors to use the right identifiers – education, awareness, best practices, journal guidelines etc. – build support into tools that researchers use • Potential outputs from BRIF subgroup, by end of GEN2PHEN – Continue work on whitepaper on identifiers (partial drafted earlier in the year) – Compile recommendations for authors & biobankers, for use cases where workable solutions exist or are emerging (data DOIs, BioDBCore) • Need some biobanker-expert help in ID subgroup! – Esp. to look in-depth into study catalogues with established identifier schemes • International Clinical Trials Registry Platform • ClinicalTrials.gov • P3G study catalogue BRIF workshop, Toulouse Oct 22 2012Monday, 22 October 12
  21. 21. Acknowledgements GEN2PHEN Consortium This work has received funding from the http://www.gen2phen.org/about-gen2phen/partners European Communitys Seventh Framework Programme (FP7/2007-2013) under grant agreement number 200754 - Prof Anthony J. Brookes Bioinformatics Group, Leicester the GEN2PHEN project. Contact me! <gt50@le.ac.uk> |<gthorisson@gmail.com> http://www.linkedin.com/in/mummi http://www.twitter.com/gthorisson Published under the CC BY license (http:// http://www.gthorisson.name creativecommons.org/licenses/by/3.0/) BRIF workshop, Toulouse Oct 22 2012Monday, 22 October 12
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×