Motivation Data deposit vs. data reuse Why track the reuse of data? Transparency Collaboration Confirm existing data Refute existing data Combine with existing data to form new conclusions Healthy Competition Invigoration
Initial Questions How is data currently cited and how often? How do we find data citations using available resources (search engines, databases, etc.)? How difficult is it to find data citations using these tools and why? What are the best/worst ways to find data citations? How do the citations vary across discipline, repository and publication? What is the most common citation? Repository name? Data author name? Unique identifier like a study number or DOI?
To whose benefit? Scientists Academic researchers Students Anyone who uses or deposits data Anyone interested in the citation or reuse of data Similar projects See also: list of projects, discussion and editorials on the OpenWetware DataONE Web Resources page: http://openwetware.org/wiki/User:Valerie_Enriquez/Not ebook/DataONE_Web_resources
Methods Initial search process: Test Limits TreeBASE searches Date range: 2008-2010 Focused search Language: English Repositories Journal articles only 1. TreeBASE Repository-specific search 2. Pangaea terms 3. ORNL DAAC TreeBASE: repository name, study accession number Databases (S####), data author name 1. ISI Web of Science Cited Pangaea: repository name, Reference Search DOI 2. Scirus prefix:10.1594/PANGAEA. 3. Google Scholar ######, data author name ORNL DAAC: repository name, DOI prefix: 10.3334/ORNLDAAC/###, data author name, project name (BOREAS, FLUXNET, etc.)
Initial Analysis1. Search comparison spreadsheet hosted here Search methods, terms and datasets used to construct search terms were captured as well as the total number of results followed by respective hits and misses. Percentages of hits vs. misses calculated within the spreadsheet. Reasons for miss captured Reasons for hit captured2. Shared fields template from Sarah with my input data hosted here Hosts data about individual articles, including DOIs as applicable, metadata and coding for hits and misses.
Stumbles and other Worrisome Things Finding focus and the difficulty of going beyond the obvious “Missing” searches How broad is too broad? How narrow is too narrow? Article cited vs. data cited Image courtesy of: http://currentskateofmind.com/2008/03/25/glo ssary-of-skating-falls/
Initial Findings ISI Web of Science Scirus Google ScholarTreeBASE 1. $ Repository name 1. $ Repository name 1. # Repository name 2. * 2. # Study Accession 2. # Study Accession 3. $ Cited Author Number Number Name/original 3. # Cited Author 3. # Cited Author publication Name/original Name/original title/date publication publication title/date title/datePangaea 1. $ Repository name 1. Repository name 1. # Repository name 2. * 2. $ DOI prefix 2. $ DOI prefix 3. $ Cited Author 3. # Cited Author 3. # Cited Author Name/original Name/original Name/original publication publication publication title/date title/date title/dateORNL DAAC 1. $ Repository name 1. $ Repository name 1. # Repository name 2. * 2. $ DOI prefix 2. $ DOI prefix 3. $ Cited Author 3. $ Cited Author 3. $ Cited Author Name/original Name/project Name/project publication name/original name/original*: invalid field input $: title/date publication effective search #: ineffective search publication title/date title/date
Lessons Learned Hey, I think I found that data citation you were looking for.Image courtesy of: http://www.squidoo.com/stop_information_overload
Where do we go from here? Solidify conclusions from initial findings. Compare data with other interns. Examine other repositories, search terms and databases. Write article about how difficult it is to find data reuse citations. Some possible publications: Collection Management DLib Link provided by Heather. Information Services & Use Author Guidelines Informing Science International Digital Curation Conference Call for Papers. Link provided by Nic. Journal of the American Society for Information Science & Technology Journal of Information Science Library Technology Reports Scientometrics