Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Data Citation Principles Harvard May 2011: ORCID and data publication - Identifying knowledge contributors to motivate sharing
1. G. A. Thorisson, University of Leicester / ORCID
http://www.orcid.org
ORCID and data publication
Identifying knowledge contributors to motivate sharing
Gudmundur A. Thorisson <gt50@le.ac.uk>
Tony Brookes bioinformatics group
Departments of Genetics
University of Leicester
-- Outline --
• Pretext: my route to workshop
• Ongoing & planned data publication projects
• Disease genetics data
• Planned integration with ORCID for researcher identification
• Role of ORCID in data publication ecosystem?
• [shameless] plug for Sept workshop on researcher identity
This work can be freely copied, redistributed and
adapted, as long as proper attribution is given
Data Citation Principles Workshop, IQSS, Harvard University 16-17 1
May 2011
Monday, 16 May 2011
2. G. A. Thorisson, University of Leicester / ORCID
http://www.orcid.org
Prologue
Data Citation Principles Workshop, IQSS, Harvard University 16-17 2
May 2011
Monday, 16 May 2011
3. G. A. Thorisson, University of Leicester / ORCID
http://www.orcid.org
Data Citation Principles Workshop, IQSS, Harvard University 16-17 3
May 2011
Monday, 16 May 2011
4. G. A. Thorisson, University of Leicester / ORCID
http://www.orcid.org
Prof Anthony J Brookes
GEN2PHEN coordinator
Chair, Bioinformatics and Genomics
Department of Genetics
University of Leicester, UK
Data Citation Principles Workshop, IQSS, Harvard University 16-17 44
May 2011
Monday, 16 May 2011
5. G. A. Thorisson, University of Leicester / ORCID
http://www.orcid.org
Data Citation Principles Workshop, IQSS, Harvard University 16-17 5
May 2011
Monday, 16 May 2011
6. G. A. Thorisson, University of Leicester / ORCID
http://www.orcid.org
The data sharing problem
Data Citation Principles Workshop, IQSS, Harvard University 16-17 6
May 2011
Monday, 16 May 2011
7. G. A. Thorisson, University of Leicester / ORCID
http://www.orcid.org
Lack of incentives for sharing
• Effort required to prepare, package and submit datasets to public
repositories
• Time better spent writing papers & grants
• All sticks (funders, journals) - no carrots
• Need incentives - treat data as publications and credit creators
“[...] Many of the issues regarding data availability can be
addressed if the principles of “publication” rather than “sharing”
are applied. However, online data publication systems also need
to develop mechanisms for data citation and indices of data
access comparable to those for citation systems in print
journals”
Costello, M. Motivating Online Publication of Data. BioScience
(2009) vol. 59 (5) pp. 418-427
Data Citation Principles Workshop, IQSS, Harvard University 16-17 7
May 2011
Monday, 16 May 2011
8. G. A. Thorisson, University of Leicester / ORCID
http://www.orcid.org
Name ambiguity => attribution challenges
How about these? Or these?
J. Smith
Are these authors all the same person? J. Smith
G. Thorisson, University of Leicester J. Smith
G. A. Thorisson, University of Leicester J. Smith
G. A. Thorisson, Cold Spring Harbor Laboratory J. Smith
[etc.]
∼2/3 of the ∼6 million authors in MEDLINE share a last name and first
initial with at least one other author, and an ambiguous name refers to
∼8 persons on average.
Torvik and Smalheiser. Author name disambiguation in MEDLINE. ACM Transactions on Knowledge
Discovery from Data (2009) vol. 3 (3)
Data Citation Principles Workshop, IQSS, Harvard University 16-17 8
May 2011
Monday, 16 May 2011
9. G. A. Thorisson, University of Leicester / ORCID
http://www.orcid.org
ORCID - tackling the contributor identity problem
?
ORCID
ORCID ID: B-1242-2010 F67572010
G. Thorisson, Univ. Leicester
G. A. Thorisson, Univ. Leicester
G. A. Thorisson, Cold Spring Harbor Lab.
ORCID ID: G-1442-2009
J. Smith, Univ. North Pole
ORCID ID: D-2400-2010
J. Smith, Luthor Corporation
Data Citation Principles Workshop, IQSS, Harvard University 16-17
May 2011
Monday, 16 May 2011
10. G. A. Thorisson, University of Leicester / ORCID
http://www.orcid.org
Projects
Data Citation Principles Workshop, IQSS, Harvard University 16-17 10
May 2011
Monday, 16 May 2011
11. G. A. Thorisson, University of Leicester / ORCID
http://www.orcid.org
Cafe Variome - facilitating exchange of genetic data
1. Diagnostic 2. Central 3. End-users (e.g.
laboratories ‘clearinghouse’ LSDB curators)
Publish data Retrieve Atom feeds
Submi&ng
muta,ons
from
diagnos,c
labs
using
“Café
Data
are
shared
with
diverse
RouGE
enabled”
so<ware
via
3rd
par,es
via
manual
simple
bu@on
click retrieval
or
automated
feed-‐
based
monitoring/retrieval
Data Citation Principles Workshop, IQSS, Harvard University 16-17 11
10
May 2011
Monday, 16 May 2011
12. G. A. Thorisson, University of Leicester / ORCID
http://www.orcid.org
Cafe Variome - facilitating exchange of genetic data
Data
shared
with
diverse
3rd
par,es
and
data
usage/cita,on
tracked
via
DOI
Submission
from
diag.
lab
✔
DOI
assigned
to
incoming
data
upload
dbSNP
(coding)
UniProt
PhenCode
Already
stable
IDs
so
no
DOI
assigned
A@ribu,on
given
to
data
submi@ers
via
ORCID
unique
iden,fier
Metadata
describing
varia,on
data
published
elsewhere
Data Citation Principles Workshop, IQSS, Harvard University 16-17 12
May 2011
Monday, 16 May 2011
13. G. A. Thorisson, University of Leicester / ORCID
http://www.orcid.org
Cafe Variome - facilitating exchange of genetic data
Data
shared
with
diverse
3rd
par,es
and
data
usage/cita,on
tracked
via
DOI
Submission
from
diag.
lab
✔
DOI
assigned
to
incoming
data
upload
dbSNP
(coding)
UniProt
PhenCode
Already
stable
IDs
so
no
DOI
assigned
A@ribu,on
given
to
data
submi@ers
via
ORCID
unique
iden,fier
Metadata
describing
varia,on
data
published
elsewhere
Data Citation Principles Workshop, IQSS, Harvard University 16-17 12
May 2011
Monday, 16 May 2011
14. G. A. Thorisson, University of Leicester / ORCID
http://www.orcid.org
Publication credit for Cafe Variome deposits
4x variants in BRCA2
gene in patient X
G. Thorisson, Univ. Leicester
gthorisson@gmail.com
ORCID ID: A-883-2010
CV user has linked his user
account with his ORCID profile
Data Citation Principles Workshop, IQSS, Harvard University 16-17
May 2011
Monday, 16 May 2011
13
15. G. A. Thorisson, University of Leicester / ORCID
http://www.orcid.org
Publication credit for Cafe Variome deposits
4x variants in BRCA2
gene in patient X
G. Thorisson, Univ. Leicester
gthorisson@gmail.com
ORCID ID: A-883-2010
CV user has linked his user
account with his ORCID profile
G. A. Thorisson (A-883-2010). 4x variants in BRCA2 gene. Published online via Cafe
Variome. 21 January (2011) doi:10.1255/caferouge.BRCA2-2352354
=> http://api.caferouge.org/atomserver/v1/caferouge/mutations/2352354
Data Citation Principles Workshop, IQSS, Harvard University 16-17
May 2011
Monday, 16 May 2011
13
16. G. A. Thorisson, University of Leicester / ORCID
http://www.orcid.org
GWAS nanopublications
• Foray into semantic publishing
– GWAS Central as ‘nano-publisher’
– variant<->disease assertion as nanopub
rs19243 <associatedWith> Type II diabetes
+
condition & provenance
• Provenance part to include:
– Contributors IDs
– Contributor roles:
• Author(s) on original GWAS paper
• Curator
• Registrant
• Citability: register DOI for nanopub?
Data Citation Principles Workshop, IQSS, Harvard University 16-17 14
May 2011
Monday, 16 May 2011
17. G. A. Thorisson, University of Leicester / ORCID
http://www.orcid.org
BRIF - measuring bioresource use and impact
• Biobanks: collections of biomaterials + associated metadata
– Identification: citing, acknowledging, tracking use of
– Evaluation: assess impact
– Attribution: crediting PIs, repository managers, technicians [?]
• Digital resources, incl. biomedical databases
– E.g. locus-specific databases (LSDBs), variation archives (e.g. Cafe Variome)
– How to acknowledge researchers who:
• Maintain vital community resource (e.g. http://www.wormbase.org )
• Undertake value-adding curation
– Micro-attribution: Giardine, B. et al. Systematic documentation and analysis of human genetic
variation in hemoglobinopathies using the microattribution approach. Nature Genetics advance on, (2011).
http://dx.doi.org/10.1038/ng.785
• BRIF online group: http://bit.ly/brif-group
Data Citation Principles Workshop, IQSS, Harvard University 16-17 15
May 2011
Monday, 16 May 2011
18. G. A. Thorisson, University of Leicester / ORCID
http://www.orcid.org
Identifying & citing databases
Data Citation Principles Workshop, IQSS, Harvard University 16-17 16
May 2011
Monday, 16 May 2011
19. G. A. Thorisson, University of Leicester / ORCID
http://www.orcid.org
Identifying & citing databases
• Bio-databases are often cited as a collection
– E.g. “In our analysis, we used release X of SwissProt”
“Our results were compared with the COL3A1 database as of Jan11”
– Example: OI variant database:
Dalgleish, R. (1998) Nucleic Acids Research 26(1), 253 http://dx.doi.org/10.1093/nar/26.1.253
Dalgleish, R. (1997) Nucleic Acids Research 25(1), 181 http://dx.doi.org/10.1093/nar/25.1.181
Osteogenesis Imperfecta Variant Database - https://oi.gene.le.ac.uk
Data Citation Principles Workshop, IQSS, Harvard University 16-17 16
May 2011
Monday, 16 May 2011
20. G. A. Thorisson, University of Leicester / ORCID
http://www.orcid.org
Identifying & citing databases
• Bio-databases are often cited as a collection
– E.g. “In our analysis, we used release X of SwissProt”
“Our results were compared with the COL3A1 database as of Jan11”
– Example: OI variant database:
Dalgleish, R. (1998) Nucleic Acids Research 26(1), 253 http://dx.doi.org/10.1093/nar/26.1.253
Dalgleish, R. (1997) Nucleic Acids Research 25(1), 181 http://dx.doi.org/10.1093/nar/25.1.181
Osteogenesis Imperfecta Variant Database - https://oi.gene.le.ac.uk
• Are DOIs appropriate? - db’s are not ‘unchanging entities’
Data Citation Principles Workshop, IQSS, Harvard University 16-17 16
May 2011
Monday, 16 May 2011
21. G. A. Thorisson, University of Leicester / ORCID
http://www.orcid.org
Identifying & citing databases
• Bio-databases are often cited as a collection
– E.g. “In our analysis, we used release X of SwissProt”
“Our results were compared with the COL3A1 database as of Jan11”
– Example: OI variant database:
Dalgleish, R. (1998) Nucleic Acids Research 26(1), 253 http://dx.doi.org/10.1093/nar/26.1.253
Dalgleish, R. (1997) Nucleic Acids Research 25(1), 181 http://dx.doi.org/10.1093/nar/25.1.181
Osteogenesis Imperfecta Variant Database - https://oi.gene.le.ac.uk
• Are DOIs appropriate? - db’s are not ‘unchanging entities’
• Minimal information about a database - include DOI name?
– What does the DOI point to? URL for database site vs. URL for db description
Data Citation Principles Workshop, IQSS, Harvard University 16-17 16
May 2011
Monday, 16 May 2011
22. G. A. Thorisson, University of Leicester / ORCID
http://www.orcid.org
Acknowledging contributions to bio-resources
Data Citation Principles Workshop, IQSS, Harvard University 16-17 17
May 2011
Monday, 16 May 2011
23. G. A. Thorisson, University of Leicester / ORCID
http://www.orcid.org
Acknowledging contributions to bio-resources
• Database curation
– Overall mgmt/responsibility: R. Dalgleish A-3523-534-144 <maintained> 10.5335/lsdb.oi.325dff
Temporary curator appointment: J. Smith G-1442-2009 <curated> 10.5335/lsdb.oi.325dff
– Microattribution: fine-grained tracking of curator activity (insert/update/delete)
– [see also GBIF presentation]
Data Citation Principles Workshop, IQSS, Harvard University 16-17 17
May 2011
Monday, 16 May 2011
24. G. A. Thorisson, University of Leicester / ORCID
http://www.orcid.org
Acknowledging contributions to bio-resources
• Database curation
– Overall mgmt/responsibility: R. Dalgleish A-3523-534-144 <maintained> 10.5335/lsdb.oi.325dff
Temporary curator appointment: J. Smith G-1442-2009 <curated> 10.5335/lsdb.oi.325dff
– Microattribution: fine-grained tracking of curator activity (insert/update/delete)
– [see also GBIF presentation]
• Biobanking activities
– Principal Investigator responsible for project (aka ‘corresponding author’)
– Laboratory personnel?
– Clinical collaborators?
Data Citation Principles Workshop, IQSS, Harvard University 16-17 17
May 2011
Monday, 16 May 2011
25. G. A. Thorisson, University of Leicester / ORCID
http://www.orcid.org
Characterizing citations and contributions
Data Citation Principles Workshop, IQSS, Harvard University 16-17 18
May 2011
Monday, 16 May 2011
26. G. A. Thorisson, University of Leicester / ORCID
http://www.orcid.org
Characterizing citations and contributions
• What is the nature of the resource citation?
– acknowledgement / earlier or related work
– reused data or materials
– extended methodology
– ‘..this study is flawed and complete rubbish!!’
Data Citation Principles Workshop, IQSS, Harvard University 16-17 18
May 2011
Monday, 16 May 2011
27. G. A. Thorisson, University of Leicester / ORCID
http://www.orcid.org
Characterizing citations and contributions
• What is the nature of the resource citation?
– acknowledgement / earlier or related work
– reused data or materials
– extended methodology
– ‘..this study is flawed and complete rubbish!!’
• What is the nature of my contribution to the resource?
– Paper: authored / undertook analysis / conceived of study / designed experiment
– Dataset: created / submitted / managed
– Database: curator / manager / PI responsible
– Biobank: sample collector / day-to-day manager / ??
– Temporal aspect:
• E.g. Mummi contributed in a curator role for SwissProt Jun 2004 to Oct 2009
Data Citation Principles Workshop, IQSS, Harvard University 16-17 18
May 2011
Monday, 16 May 2011
28. G. A. Thorisson, University of Leicester / ORCID
http://www.orcid.org
Shotton, D., 2010. CiTO, the Citation
Typing Ontology. Journal of Biomedical
Semantics, 1(Suppl 1).
doi:10.1186/2041-1480-1-S1-S6
Data Citation Principles Workshop, IQSS, Harvard University 16-17 19
May 2011
Monday, 16 May 2011
29. G. A. Thorisson, University of Leicester / ORCID
http://www.orcid.org
Shotton, D., 2010. CiTO, the Citation
Typing Ontology. Journal of Biomedical
Semantics, 1(Suppl 1).
doi:10.1186/2041-1480-1-S1-S6
my study <cito:extends> Thorisson et al. 2008 doi:10.433/888544jamaX
my study <cito:usesSamplesFrom> Biobank X doi:10.424/35xxjapan.5 ??
G. Thorisson (A-523-44-3423) <pro:manager> Biobank X doi:10.424/35xxjapan??
Data Citation Principles Workshop, IQSS, Harvard University 16-17 19
May 2011
Monday, 16 May 2011
30. G. A. Thorisson, University of Leicester / ORCID
http://www.orcid.org
ORCID and contributor recognition
Data Citation Principles Workshop, IQSS, Harvard University 16-17 20
May 2011
Monday, 16 May 2011
31. G. A. Thorisson, University of Leicester / ORCID
http://www.orcid.org
Why track all this stuff?
Enable aggregation of contributions by unique researcher ID
• Who contributed to dataset 10.4259/psycho.5gtpq-thorisson?
• All data publications by A-883-2010 ?
• Which papers have cited the works of A-883-2010 ?
G. Thorisson, Univ. Leicester
gthorisson@gmail.com
ORCID ID: A-883-2010
• Total no. citations to datasets by A-883-2010 in the last 2 years?
• Total no. downloads of datasets by A-883-2010?
• Which database projects has A-883-2010 contributed to?
• [...]
Data Citation Principles Workshop, IQSS, Harvard University 16-17
May 2011
Monday, 16 May 2011
32. G. A. Thorisson, University of Leicester / ORCID
http://www.orcid.org
Current ORCID status & timeline
• Alpha prototype
– Running on a sandbox website for limited testing
• partial functionality - based on ResearcherID software
• Early adopters / collaborators
• Looking to collaborate with projects
– Gather use cases => feed requirements for ORCID
core system
– WHERE/HOW might ORCID be used to identify
contributors?
– Joint fund-seeking to do pilot implementations
Data Citation Principles Workshop, IQSS, Harvard University 16-17 22
May 2011
Monday, 16 May 2011
33. G. A. Thorisson, University of Leicester / ORCID
http://www.orcid.org
Current ORCID status & timeline
• Alpha prototype
– Running on a sandbox website for limited testing
• partial functionality - based on ResearcherID software
• Timeline for live beta system: early 2012
• Early adopters / collaborators
• Looking to collaborate with projects
– Gather use cases => feed requirements for ORCID
core system
– WHERE/HOW might ORCID be used to identify
contributors?
– Joint fund-seeking to do pilot implementations
Data Citation Principles Workshop, IQSS, Harvard University 16-17 22
May 2011
Monday, 16 May 2011
34. G. A. Thorisson, University of Leicester / ORCID
http://www.orcid.org
Example: SageCite?
• i) dataset published in SageCommons
– assigned DOI via DataCite
– attribution link deposited in ORCID
• ii) derivative datasets published in SageCommons
– assigned DOI => DataCite
– attribution link deposited in ORCID
• iii) analysis workflow published via myExperiment
– attribution => ORCID (creator/submitter & others who contributed)
– DOI (or not - not essential?)
Data Citation Principles Workshop, IQSS, Harvard University 16-17 23
May 2011
Monday, 16 May 2011
35. G. A. Thorisson, University of Leicester / ORCID
http://www.orcid.org
<Shameless_plug>
Data Citation Principles Workshop, IQSS, Harvard University 16-17 24
May 2011
Monday, 16 May 2011
36. G. A. Thorisson, University of Leicester / ORCID
http://www.orcid.org
Data Citation Principles Workshop, IQSS, Harvard University 16-17 25
May 2011
Monday, 16 May 2011
37. G. A. Thorisson, University of Leicester / ORCID
http://www.orcid.org
</Shameless_plug>
Data Citation Principles Workshop, IQSS, Harvard University 16-17 26
May 2011
Monday, 16 May 2011
38. G. A. Thorisson, University of Leicester / ORCID
http://www.orcid.org
Acknowledgements
GEN2PHEN Consortium
This work has received funding from
http://www.gen2phen.org/about-gen2phen/partners the European Community's Seventh
Framework Programme
(FP7/2007-2013)
under grant agreement number
200754 - the GEN2PHEN project.
Anthony J. Brookes Bioinformatics Group
Contact me!
Gudmundur ‘Mummi’ Thorisson
<gt50@le.ac.uk> |<gthorisson@gmail.com>
http://friendfeed.com/mummi
http://www.linkedin.com/in/mummi
http://www.twitter.com/gthorisson
Data Citation Principles Workshop, IQSS, Harvard University 16-17 27
May 2011
Monday, 16 May 2011