Data Citation Principles Harvard May 2011: ORCID and data publication - Identifying knowledge contributors to motivate sharing
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Data Citation Principles Harvard May 2011: ORCID and data publication - Identifying knowledge contributors to motivate sharing

on

  • 1,521 views

 

Statistics

Views

Total Views
1,521
Views on SlideShare
1,520
Embed Views
1

Actions

Likes
0
Downloads
11
Comments
0

1 Embed 1

http://twitter.com 1

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

CC Attribution License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Data Citation Principles Harvard May 2011: ORCID and data publication - Identifying knowledge contributors to motivate sharing Presentation Transcript

  • 1. G. A. Thorisson, University of Leicester / ORCID http://www.orcid.org ORCID and data publication Identifying knowledge contributors to motivate sharing Gudmundur A. Thorisson <gt50@le.ac.uk> Tony Brookes bioinformatics group Departments of Genetics University of Leicester -- Outline -- • Pretext: my route to workshop • Ongoing & planned data publication projects • Disease genetics data • Planned integration with ORCID for researcher identification • Role of ORCID in data publication ecosystem? • [shameless] plug for Sept workshop on researcher identity This work can be freely copied, redistributed and adapted, as long as proper attribution is given Data Citation Principles workshop, Harvard 16-17 May 2011 1Monday, 16 May 2011
  • 2. G. A. Thorisson, University of Leicester / ORCID http://www.orcid.org Pretext Data Citation Principles workshop, Harvard 16-17 May 2011 2Monday, 16 May 2011
  • 3. G. A. Thorisson, University of Leicester / ORCID http://www.orcid.org Data Citation Principles workshop, Harvard 16-17 May 2011 3Monday, 16 May 2011
  • 4. G. A. Thorisson, University of Leicester / ORCID http://www.orcid.org Prof Anthony J Brookes GEN2PHEN coordinator Chair, Bioinformatics and Genomics Department of Genetics University of Leicester, UK Data Citation Principles workshop, Harvard 16-17 May 2011 44Monday, 16 May 2011
  • 5. G. A. Thorisson, University of Leicester / ORCID http://www.orcid.org Data Citation Principles workshop, Harvard 16-17 May 2011 5Monday, 16 May 2011
  • 6. G. A. Thorisson, University of Leicester / ORCID http://www.orcid.org The data sharing problem Data Citation Principles workshop, Harvard 16-17 May 2011 6Monday, 16 May 2011
  • 7. G. A. Thorisson, University of Leicester / ORCID http://www.orcid.org Lack of incentives for sharing • Effort required to prepare, package and submit datasets to public repositories • Time better spent writing papers & grants • All sticks (funders, journals) - no carrots • Need incentives - treat data as publications and credit creators “[...] Many of the issues regarding data availability can be addressed if the principles of “publication” rather than “sharing” are applied. However, online data publication systems also need to develop mechanisms for data citation and indices of data access comparable to those for citation systems in print journals” Costello, M. Motivating Online Publication of Data. BioScience (2009) vol. 59 (5) pp. 418-427 Data Citation Principles workshop, Harvard 16-17 May 2011 7Monday, 16 May 2011
  • 8. G. A. Thorisson, University of Leicester / ORCID http://www.orcid.org Name ambiguity => attribution challenges How about these? Or these? J. Smith Are these authors all the same person? J. SmithG. Thorisson, University of Leicester J. SmithG. A. Thorisson, University of Leicester J. SmithG. A. Thorisson, Cold Spring Harbor Laboratory J. Smith [etc.] ∼2/3 of the ∼6 million authors in MEDLINE share a last name and first initial with at least one other author, and an ambiguous name refers to ∼8 persons on average. Torvik and Smalheiser. Author name disambiguation in MEDLINE. ACM Transactions on Knowledge Discovery from Data (2009) vol. 3 (3) Data Citation Principles workshop, Harvard 16-17 May 2011 8Monday, 16 May 2011
  • 9. G. A. Thorisson, University of Leicester / ORCID http://www.orcid.org ORCID - tackling the contributor identity problem ? ORCID ORCID ID: B-1242-2010 F67572010 G. Thorisson, Univ. Leicester G. A. Thorisson, Univ. Leicester G. A. Thorisson, Cold Spring Harbor Lab. ORCID ID: G-1442-2009 J. Smith, Univ. North Pole ORCID ID: D-2400-2010 J. Smith, Luthor Corporation Data Citation Principles workshop, Harvard 16-17 May 2011Monday, 16 May 2011
  • 10. G. A. Thorisson, University of Leicester / ORCID http://www.orcid.org Projects Data Citation Principles workshop, Harvard 16-17 May 2011 10Monday, 16 May 2011
  • 11. G. A. Thorisson, University of Leicester / ORCID http://www.orcid.org Cafe Variome - facilitating exchange of genetic data 1. Diagnostic 2. Central 3. End-users (e.g. laboratories ‘clearinghouse’ LSDB curators) Publish data Retrieve Atom feeds Submi&ng  muta,ons  from   diagnos,c  labs  using  “Café   Data  are  shared  with  diverse   RouGE  enabled”  so<ware  via   3rd  par,es  via  manual   simple  bu@on  click retrieval  or  automated  feed-­‐ based  monitoring/retrieval Data Citation Principles workshop, Harvard 16-17 May 2011 11 10Monday, 16 May 2011
  • 12. G. A. Thorisson, University of Leicester / ORCID http://www.orcid.org Cafe Variome - facilitating exchange of genetic data Data  shared    with  diverse  3rd  par,es  and   data  usage/cita,on  tracked  via  DOI Submission  from  diag.  lab ✔ DOI  assigned  to  incoming   data  upload dbSNP  (coding) UniProt PhenCode Already  stable  IDs  so  no  DOI  assigned A@ribu,on  given  to  data  submi@ers via  ORCID  unique  iden,fier Metadata  describing  varia,on   data  published  elsewhere Data Citation Principles workshop, Harvard 16-17 May 2011 12Monday, 16 May 2011
  • 13. G. A. Thorisson, University of Leicester / ORCID http://www.orcid.org Cafe Variome - facilitating exchange of genetic data Data  shared    with  diverse  3rd  par,es  and   data  usage/cita,on  tracked  via  DOI Submission  from  diag.  lab ✔ DOI  assigned  to  incoming   data  upload dbSNP  (coding) UniProt PhenCode Already  stable  IDs  so  no  DOI  assigned A@ribu,on  given  to  data  submi@ers via  ORCID  unique  iden,fier Metadata  describing  varia,on   data  published  elsewhere Data Citation Principles workshop, Harvard 16-17 May 2011 12Monday, 16 May 2011
  • 14. G. A. Thorisson, University of Leicester / ORCID http://www.orcid.org Publication credit for Cafe Variome deposits 4x variants in BRCA2 gene in patient XG. Thorisson, Univ. Leicester gthorisson@gmail.comORCID ID: A-883-2010 CV user has linked his user account with his ORCID profile Data Citation Principles workshop, Harvard 16-17 May 2011Monday, 16 May 2011 13
  • 15. G. A. Thorisson, University of Leicester / ORCID http://www.orcid.org Publication credit for Cafe Variome deposits 4x variants in BRCA2 gene in patient XG. Thorisson, Univ. Leicester gthorisson@gmail.comORCID ID: A-883-2010 CV user has linked his user account with his ORCID profile G. A. Thorisson (A-883-2010). 4x variants in BRCA2 gene. Published online via Cafe Variome. 21 January (2011) doi:10.1255/caferouge.BRCA2-2352354 => http://api.caferouge.org/atomserver/v1/caferouge/mutations/2352354 Data Citation Principles workshop, Harvard 16-17 May 2011Monday, 16 May 2011 13
  • 16. G. A. Thorisson, University of Leicester / ORCID http://www.orcid.org GWAS nanopublications • Foray into semantic publishing – GWAS Central as ‘nano-publisher’ – variant<->disease assertion as nanopub rs19243 <associatedWith> Type II diabetes + condition & provenance • Provenance part to include: – Contributors IDs – Contributor roles: • Author(s) on original GWAS paper • Curator • Registrant • Citability: register DOI for nanopub? Data Citation Principles workshop, Harvard 16-17 May 2011 14Monday, 16 May 2011
  • 17. G. A. Thorisson, University of Leicester / ORCID http://www.orcid.org BRIF - measuring bioresource use and impact • Biobanks: collections of biomaterials + associated metadata – Identification: citing, acknowledging, tracking use of – Evaluation: assess impact – Attribution: crediting PIs, repository managers, technicians [?] • Digital resources, incl. biomedical databases – E.g. locus-specific databases (LSDBs), variation archives (e.g. Cafe Variome) – How to acknowledge researchers who: • Maintain vital community resource (e.g. http://www.wormbase.org ) • Undertake value-adding curation – Micro-attribution: Giardine, B. et al. Systematic documentation and analysis of human genetic variation in hemoglobinopathies using the microattribution approach. Nature Genetics advance on, (2011). http://dx.doi.org/10.1038/ng.785 • BRIF online group: http://bit.ly/brif-group Data Citation Principles workshop, Harvard 16-17 May 2011 15Monday, 16 May 2011
  • 18. G. A. Thorisson, University of Leicester / ORCID http://www.orcid.org Identifying & citing databases Data Citation Principles workshop, Harvard 16-17 May 2011 16Monday, 16 May 2011
  • 19. G. A. Thorisson, University of Leicester / ORCID http://www.orcid.org Identifying & citing databases • Bio-databases are often cited as a collection – E.g. “In our analysis, we used release X of SwissProt” “Our results were compared with the COL3A1 database as of Jan11” – Example: OI variant database: Dalgleish, R. (1998) Nucleic Acids Research 26(1), 253 http://dx.doi.org/10.1093/nar/26.1.253 Dalgleish, R. (1997) Nucleic Acids Research 25(1), 181 http://dx.doi.org/10.1093/nar/25.1.181 Osteogenesis Imperfecta Variant Database - https://oi.gene.le.ac.uk Data Citation Principles workshop, Harvard 16-17 May 2011 16Monday, 16 May 2011
  • 20. G. A. Thorisson, University of Leicester / ORCID http://www.orcid.org Identifying & citing databases • Bio-databases are often cited as a collection – E.g. “In our analysis, we used release X of SwissProt” “Our results were compared with the COL3A1 database as of Jan11” – Example: OI variant database: Dalgleish, R. (1998) Nucleic Acids Research 26(1), 253 http://dx.doi.org/10.1093/nar/26.1.253 Dalgleish, R. (1997) Nucleic Acids Research 25(1), 181 http://dx.doi.org/10.1093/nar/25.1.181 Osteogenesis Imperfecta Variant Database - https://oi.gene.le.ac.uk • Are DOIs appropriate? - db’s are not ‘unchanging entities’ Data Citation Principles workshop, Harvard 16-17 May 2011 16Monday, 16 May 2011
  • 21. G. A. Thorisson, University of Leicester / ORCID http://www.orcid.org Identifying & citing databases • Bio-databases are often cited as a collection – E.g. “In our analysis, we used release X of SwissProt” “Our results were compared with the COL3A1 database as of Jan11” – Example: OI variant database: Dalgleish, R. (1998) Nucleic Acids Research 26(1), 253 http://dx.doi.org/10.1093/nar/26.1.253 Dalgleish, R. (1997) Nucleic Acids Research 25(1), 181 http://dx.doi.org/10.1093/nar/25.1.181 Osteogenesis Imperfecta Variant Database - https://oi.gene.le.ac.uk • Are DOIs appropriate? - db’s are not ‘unchanging entities’ • Minimal information about a database - include DOI name? – What does the DOI point to? URL for database site vs. URL for db description Data Citation Principles workshop, Harvard 16-17 May 2011 16Monday, 16 May 2011
  • 22. G. A. Thorisson, University of Leicester / ORCID http://www.orcid.org Attributing contributions to bio-resources Data Citation Principles workshop, Harvard 16-17 May 2011 17Monday, 16 May 2011
  • 23. G. A. Thorisson, University of Leicester / ORCID http://www.orcid.org Attributing contributions to bio-resources • Database curation – Management: R. Dalgleish A-3523-534-144 <maintained> 10.5335/lsdb.oi.325dff Temporary curator appointment: J. Smith G-1442-2009 <curated> 10.5335/lsdb.oi.325dff – Microattribution: fine-grained tracking of curator activity (insert/update/delete) Data Citation Principles workshop, Harvard 16-17 May 2011 17Monday, 16 May 2011
  • 24. G. A. Thorisson, University of Leicester / ORCID http://www.orcid.org Attributing contributions to bio-resources • Database curation – Management: R. Dalgleish A-3523-534-144 <maintained> 10.5335/lsdb.oi.325dff Temporary curator appointment: J. Smith G-1442-2009 <curated> 10.5335/lsdb.oi.325dff – Microattribution: fine-grained tracking of curator activity (insert/update/delete) • Biobanking activities – Principal Investigator responsible for project (aka ‘corresponding author’) – Laboratory personnel? – Clinical collaborators? Data Citation Principles workshop, Harvard 16-17 May 2011 17Monday, 16 May 2011
  • 25. G. A. Thorisson, University of Leicester / ORCID http://www.orcid.org Characterizing citations and contributions Data Citation Principles workshop, Harvard 16-17 May 2011 18Monday, 16 May 2011
  • 26. G. A. Thorisson, University of Leicester / ORCID http://www.orcid.org Characterizing citations and contributions • What is the nature of the resource citation? – acknowledgement / earlier or related work – reused data or materials – extended methodology – ‘..this study is flawed and complete rubbish!!’ Data Citation Principles workshop, Harvard 16-17 May 2011 18Monday, 16 May 2011
  • 27. G. A. Thorisson, University of Leicester / ORCID http://www.orcid.org Characterizing citations and contributions • What is the nature of the resource citation? – acknowledgement / earlier or related work – reused data or materials – extended methodology – ‘..this study is flawed and complete rubbish!!’ • What is the nature of my contribution to the resource? – Paper: authored / undertook analysis / conceived of study / designed experiment – Dataset: created / submitted / managed – Database: curator / manager / PI responsible – Biobank: sample collector / day-to-day manager / ?? – Temporal aspect: • E.g. Mummi contributed in a curator role for SwissProt Jun 2004 to Oct 2009 Data Citation Principles workshop, Harvard 16-17 May 2011 18Monday, 16 May 2011
  • 28. G. A. Thorisson, University of Leicester / ORCID http://www.orcid.org Semantic frameworks for scientific publishing Shotton, D., 2010. CiTO, the Citation Typing Ontology. Journal of Biomedical Semantics, 1(Suppl 1). doi:10.1186/2041-1480-1-S1-S6 Data Citation Principles workshop, Harvard 16-17 May 2011 19Monday, 16 May 2011
  • 29. G. A. Thorisson, University of Leicester / ORCID http://www.orcid.org Semantic frameworks for scientific publishing Shotton, D., 2010. CiTO, the Citation Typing Ontology. Journal of Biomedical Semantics, 1(Suppl 1). doi:10.1186/2041-1480-1-S1-S6 my study <cito:extends> Thorisson et al. 2008 doi:10.433/888544jamaX my study <cito:usesSamplesFrom> Biobank X doi:10.424/35xxjapan.5 ?? G. Thorisson (A-523-44-3423) <pro:manager> Biobank X doi:10.424/35xxjapan?? Data Citation Principles workshop, Harvard 16-17 May 2011 19Monday, 16 May 2011
  • 30. G. A. Thorisson, University of Leicester / ORCID http://www.orcid.org Role of ORCID? Data Citation Principles workshop, Harvard 16-17 May 2011 20Monday, 16 May 2011
  • 31. G. A. Thorisson, University of Leicester / ORCID http://www.orcid.org Why track all this stuff? Enable aggregation of contributions by unique researcher ID • Who contributed to dataset 10.4259/psycho.5gtpq-thorisson? • All data publications by ORCID A-883-2010 ? • Which papers have cited the works of ORCID A-883-2010 ? G. Thorisson, Univ. Leicester gthorisson@gmail.com ORCID ID: A-883-2010 • Total no. citations to datasets by A-883-2010 in the last 2 years? • Total no. downloads of datasets by A-883-2010? • Which database projects has A-883-2010 contributed to? • [...] Data Citation Principles workshop, Harvard 16-17 May 2011Monday, 16 May 2011
  • 32. G. A. Thorisson, University of Leicester / ORCID http://www.orcid.org Current ORCID status & timeline • Alpha prototype – Running on a sandbox website for limited testing • partial functionality - based on ResearcherID software • Early adopters / collaborators • Looking to collaborate with projects – Gather use cases => feed requirements for ORCID core system – WHERE/HOW might ORCID be used to identify contributors? – Joint fund-seeking to do pilot implementations Data Citation Principles workshop, Harvard 16-17 May 2011 22Monday, 16 May 2011
  • 33. G. A. Thorisson, University of Leicester / ORCID http://www.orcid.org Current ORCID status & timeline • Alpha prototype – Running on a sandbox website for limited testing • partial functionality - based on ResearcherID software • Timeline for live beta system: early 2012 • Early adopters / collaborators • Looking to collaborate with projects – Gather use cases => feed requirements for ORCID core system – WHERE/HOW might ORCID be used to identify contributors? – Joint fund-seeking to do pilot implementations Data Citation Principles workshop, Harvard 16-17 May 2011 22Monday, 16 May 2011
  • 34. G. A. Thorisson, University of Leicester / ORCID http://www.orcid.org Example: SageCite? • i) dataset published in SageCommons – assigned DOI via DataCite – attribution link deposited in ORCID • ii) derivative datasets published in SageCommons – assigned DOI => DataCite – attribution link deposited in ORCID • iii) analysis workflow published via myExperiment – attribution => ORCID (creator/submitter & others who contributed) – DOI (or not - not essential?) Data Citation Principles workshop, Harvard 16-17 May 2011 23Monday, 16 May 2011
  • 35. G. A. Thorisson, University of Leicester / ORCID http://www.orcid.org Data Citation Principles workshop, Harvard 16-17 May 2011 24Monday, 16 May 2011
  • 36. G. A. Thorisson, University of Leicester / ORCID http://www.orcid.org Acknowledgements GEN2PHEN Consortium This work has received funding from http://www.gen2phen.org/about-gen2phen/partners the European Communitys Seventh Framework Programme (FP7/2007-2013) under grant agreement number 200754 - the GEN2PHEN project. Anthony J. Brookes Bioinformatics Group Contact me! Gudmundur ‘Mummi’ Thorisson <gt50@le.ac.uk> |<gthorisson@gmail.com> http://friendfeed.com/mummi http://www.linkedin.com/in/mummi http://www.twitter.com/gthorisson Data Citation Principles workshop, Harvard 16-17 May 2011 25Monday, 16 May 2011