Kathleen Fear, ICPSR, University of Michigan
“The impact of data reuse: a pilot study of 5 measures”
Panel: Data citation and altmetrics
Research Data Access & Preservation Summit 2013
Baltimore, MD April 4, 2013 #rdap13
RDAP 16: Building Without a Plan: How do you assess structural strength? (Pan...
RDAP13 Elizabeth Moss: The impact of data reuse
1. Viable Data Citation:
Expanding the Impact of
Social Science Research
RDAP13 Panel on Data Citation
and Altmetrics, April 5, 2013
Elizabeth Moss, ICPSR
eammoss@umich.edu
2. At ICPSR
• Providing opportunities for tracking and
measuring impact
• Linking data to the literature, and the
challenges involved
• Aiding the cultural shift to viable citing
practice (impact can be better measured
if data use is readily discernable)
3.
4. Top 10 Data Downloads in the Previous Six Months
(non-anonymous, distinct users downloading one or more files)
ICPSR Study Title # Downloads
National Longitudinal Study of Adolescent Health (Add Health), 1994-2008 1817
National Survey on Drug Use and Health, 2010 1109
Chinese Household Income Project, 2002 648
General Social Survey, 1972-2010 [Cumulative File] 643
National Survey on Drug Use and Health, 2011 603
Collaborative Psychiatric Epidemiology Surveys (CPES),
527
2001-2003 [United States]
Health Behavior in School-Aged Children (HBSC), 2005-2006 509
American National Election Study, 2008: Pre- and Post-Election Survey 427
India Human Development Survey (IHDS), 2005 395
School Survey on Crime and Safety (SSOCS), 2006 339
10. Link research data to scholarly
literature about it
• Increase likelihood of discovery and re-use
• Aid students, instructors, researchers, and
funders
The ICPSR Bibliography of Data-related Literature
11. It’s really a searchable database . . .
. . . containing 65,000 citations of known published
and unpublished works resulting from analyses of
data archived at ICPSR
. . . that resides in Oracle, with an internal UI for
database management
. . . that can generate study bibliographies
linking each study with the literature about it, and
out to the full text
12.
13.
14.
15.
16.
17.
18.
19.
20.
21. It’s useful to all stakeholders
Instructors direct students to begin data-related
research projects by reading some of the major works
based on the data
Advanced researchers also use it to conduct a focused
literature review before deciding to use a dataset
Reporters and policymakers looking for processed
statistics look for reports explaining studies
Principal investigators and funding agencies want to
track how data are used after they are deposited
23. The state of data citation in the
social science literature
24. Sample?
Abstract?
Methods?
Acknowledgements?
Data
“Sighting”
(implicit)
vs.
Discussion?
Data Charts and
Tables?
Footnotes?
Citing
(explicit)
Appendices?
References!
25. Typical “sightings”
• Sample described, not named, no author
information, no access information, only a
publication cited
• Data named in text, with some attribution, but
no access information
• Cited in reference section, but with no
permanent, unique identifier, so difficult for
indexing scripts to find to automate tracking
26. ICPSR’s advocates the use of DOIs
• ICPSR has been providing citations to its data since
1990 and started assigning DOIs in 2008
• DOIs apply at the study or collection level (a study
can have multiple datasets) and resolve to the
study home page with richest metadata
• DOIs are of the form: doi:10.3886/ICPSR04549
28. Challenges in database search infrastructure
• Journal databases fielded for journal article
discovery are not ideal for finding
data “sightation”
• No field searching on methods sections
• Full-text search brings back too many bad hits
• Limiting to abstract misses too many good hits
29. Challenges in tracking many studies
• Tension between highly curating a
manageable collection and minimally
maintaining a broad collection
• Too many publications for efficient
collection by humans, so we must make it
easy for scripts to do it reliably
30. Challenges of completeness
• Data use that is too difficult/costly to find cannot
be counted
• A selective sample, difficult to draw accurate
conclusions in broad analyses of re-use
31. Challenges in publishing practice, and
lack of data management planning
• Publishing sequence prevents citation
creation before publication
• Potential for change by educating the
PI/mentor
• Consciousness raising starting to occur due
to funders’ requirements
32.
33.
34.
35.
36.
37. Poorly described and cited data
+
Excessive human search effort
=
Too costly, too questionable for confident
measure of impact
38.
39.
40.
41. Citing data with a DOI
+
Minimal human search effort
=
High hit accuracy for the cost, and better
confidence of impact measures
42.
43. Finding data with simple search fields
Integration with Web of Knowledge
All Databases: Research data is
equal to research literature
44. Articles linked to underlying data.
Increased data discovery.
Reward for data citation.
Potential for automated tracking.
Converting journal search
infrastructure to meet the needs of
data, but synching metadata still a
work in progress.
45. Building a culture of viable data citation
to improve measures of impact
46. Provide PIs and users with citations and
DOIs for all study-level data
52. Three meetings: Journal editors,
domain repositories, and funders
• Establish consistent data citation in social
science journals
• Encourage transparency in research
• Optimize editorial work flows: sequencing
• Develop common standards for repositories
• Find long-term funding models repository
sustainability
An environment with no standard way of citing research data and no established publishing infrastructure to optimize good discovery and attribution
One of the reasons we were founded was to share data that not everyone could collect themselvesBig, costly longitudinal studiesInternational studiesFederally funded studies All the more reason to make them available to everyone
Will also create an API for scripting to occur to track alternative metrics like downloads statistics by user type
Click on the Find Publications link.
We provide study-level and citation-level metadata in an XML feedWe are happy to provide this to anyone to improve the landscape of data citation, discovery, and recognition
DataPASS partners successfully lobbied ASA to include guidelines for data citation.