2. Motivation
Data deposit vs. data reuse
Why track the reuse of data?
Transparency
Collaboration
Confirm existing data
Refute existing data
Combine with existing data to form new conclusions
Healthy Competition
Invigoration
3. Initial Questions
How is data currently cited and how often?
How do we find data citations using available
resources (search engines, databases, etc.)?
How difficult is it to find data citations using these
tools and why?
What are the best/worst ways to find data citations?
How do the citations vary across discipline,
repository and publication?
What is the most common citation? Repository
name? Data author name? Unique identifier like a
study number or DOI?
4. To whose benefit?
Scientists
Academic researchers
Students
Anyone who uses or deposits data
Anyone interested in the citation or reuse of data
Similar projects
See also: list of projects, discussion and editorials on
the OpenWetware DataONE Web Resources page:
http://openwetware.org/wiki/User:Valerie_Enriquez/Not
ebook/DataONE_Web_resources
5. Methods
Initial search process: Test Limits
TreeBASE searches Date range: 2008-2010
Focused search Language: English
Repositories Journal articles only
1. TreeBASE Repository-specific search
2. Pangaea terms
3. ORNL DAAC TreeBASE: repository name,
study accession number
Databases (S####), data author name
1. ISI Web of Science Cited Pangaea: repository name,
Reference Search DOI
2. Scirus prefix:10.1594/PANGAEA.
3. Google Scholar ######, data author name
ORNL DAAC: repository
name, DOI prefix:
10.3334/ORNLDAAC/###,
data author name, project
name (BOREAS, FLUXNET,
etc.)
6. Initial Analysis
1. Search comparison spreadsheet hosted here
Search methods, terms and datasets used to construct
search terms were captured as well as the total number of
results followed by respective hits and misses.
Percentages of hits vs. misses calculated within the
spreadsheet.
Reasons for miss captured
Reasons for hit captured
2. Shared fields template from Sarah with my input
data hosted here
Hosts data about individual articles, including DOIs as
applicable, metadata and coding for hits and misses.
7. Stumbles and other Worrisome Things
Finding focus and the
difficulty of going
beyond the obvious
“Missing” searches
How broad is too
broad? How narrow is
too narrow?
Article cited vs. data
cited
Image courtesy of:
http://currentskateofmind.com/2008/03/25/glo
ssary-of-skating-falls/
8. Initial Findings
ISI Web of Science Scirus Google Scholar
TreeBASE 1. $ Repository name 1. $ Repository name 1. # Repository name
2. * 2. # Study Accession 2. # Study Accession
3. $ Cited Author Number Number
Name/original 3. # Cited Author 3. # Cited Author
publication Name/original Name/original
title/date publication publication
title/date title/date
Pangaea 1. $ Repository name 1. Repository name 1. # Repository name
2. * 2. $ DOI prefix 2. $ DOI prefix
3. $ Cited Author 3. # Cited Author 3. # Cited Author
Name/original Name/original Name/original
publication publication publication
title/date title/date title/date
ORNL DAAC 1. $ Repository name 1. $ Repository name 1. # Repository name
2. * 2. $ DOI prefix 2. $ DOI prefix
3. $ Cited Author 3. $ Cited Author 3. $ Cited Author
Name/original Name/project Name/project
publication name/original name/original
*: invalid field input $: title/date publication
effective search #: ineffective search publication
title/date title/date
9. Lessons Learned
Hey, I think I found that data
citation you were looking for.
Image courtesy of: http://www.squidoo.com/stop_information_overload
10. Where do we go from here?
Solidify conclusions from initial findings.
Compare data with other interns.
Examine other repositories, search terms and
databases.
Write article about how difficult it is to find data reuse
citations. Some possible publications:
Collection Management
DLib Link provided by Heather.
Information Services & Use Author Guidelines
Informing Science
International Digital Curation Conference Call for Papers. Link provided by Nic.
Journal of the American Society for Information Science & Technology
Journal of Information Science
Library Technology Reports
Scientometrics