Your SlideShare is downloading. ×
RDAP13 Elizabeth Moss: The impact of data reuse
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

RDAP13 Elizabeth Moss: The impact of data reuse


Published on

Kathleen Fear, ICPSR, University of Michigan …

Kathleen Fear, ICPSR, University of Michigan

“The impact of data reuse: a pilot study of 5 measures”

Panel: Data citation and altmetrics
Research Data Access & Preservation Summit 2013
Baltimore, MD April 4, 2013 #rdap13

  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide
  • An environment with no standard way of citing research data and no established publishing infrastructure to optimize good discovery and attribution
  • One of the reasons we were founded was to share data that not everyone could collect themselvesBig, costly longitudinal studiesInternational studiesFederally funded studies All the more reason to make them available to everyone
  • Will also create an API for scripting to occur to track alternative metrics like downloads statistics by user type
  • Click on the Find Publications link.
  • We provide study-level and citation-level metadata in an XML feedWe are happy to provide this to anyone to improve the landscape of data citation, discovery, and recognition
  • DataPASS partners successfully lobbied ASA to include guidelines for data citation.
  • Transcript

    • 1. Viable Data Citation:Expanding the Impact ofSocial Science ResearchRDAP13 Panel on Data Citationand Altmetrics, April 5, 2013Elizabeth Moss,
    • 2. At ICPSR • Providing opportunities for tracking and measuring impact • Linking data to the literature, and the challenges involved • Aiding the cultural shift to viable citing practice (impact can be better measured if data use is readily discernable)
    • 3. Top 10 Data Downloads in the Previous Six Months (non-anonymous, distinct users downloading one or more files) ICPSR Study Title # DownloadsNational Longitudinal Study of Adolescent Health (Add Health), 1994-2008 1817National Survey on Drug Use and Health, 2010 1109Chinese Household Income Project, 2002 648General Social Survey, 1972-2010 [Cumulative File] 643National Survey on Drug Use and Health, 2011 603Collaborative Psychiatric Epidemiology Surveys (CPES), 5272001-2003 [United States]Health Behavior in School-Aged Children (HBSC), 2005-2006 509American National Election Study, 2008: Pre- and Post-Election Survey 427India Human Development Survey (IHDS), 2005 395School Survey on Crime and Safety (SSOCS), 2006 339
    • 4. Who uses these shared data?With what impact?
    • 5. Obtaining ICPSRMetadataICPSR metadata areavailable in twoformats:•DDI Codebook XML•MARC21•OAI-PMH
    • 6. Link research data to scholarly literature about it • Increase likelihood of discovery and re-use • Aid students, instructors, researchers, and fundersThe ICPSR Bibliography of Data-related Literature
    • 7. It’s really a searchable database . . . . . . containing 65,000 citations of known published and unpublished works resulting from analyses of data archived at ICPSR . . . that resides in Oracle, with an internal UI for database management . . . that can generate study bibliographies linking each study with the literature about it, and out to the full text
    • 8. It’s useful to all stakeholdersInstructors direct students to begin data-relatedresearch projects by reading some of the major worksbased on the dataAdvanced researchers also use it to conduct a focusedliterature review before deciding to use a datasetReporters and policymakers looking for processedstatistics look for reports explaining studiesPrincipal investigators and funding agencies want totrack how data are used after they are deposited
    • 9. But challenging to provide
    • 10. The state of data citation in thesocial science literature
    • 11. Sample? Abstract? Methods? Acknowledgements? Data “Sighting” (implicit) vs.Discussion? Data Charts and Tables?Footnotes? Citing (explicit) Appendices? References!
    • 12. Typical “sightings”• Sample described, not named, no author information, no access information, only a publication cited• Data named in text, with some attribution, but no access information• Cited in reference section, but with no permanent, unique identifier, so difficult for indexing scripts to find to automate tracking
    • 13. ICPSR’s advocates the use of DOIs• ICPSR has been providing citations to its data since 1990 and started assigning DOIs in 2008• DOIs apply at the study or collection level (a study can have multiple datasets) and resolve to the study home page with richest metadata• DOIs are of the form: doi:10.3886/ICPSR04549
    • 14. A-typical “citing:”In the references, with the DOI doi:10.3886/ICPSR21240
    • 15. Challenges in database search infrastructure • Journal databases fielded for journal article discovery are not ideal for finding data “sightation” • No field searching on methods sections • Full-text search brings back too many bad hits • Limiting to abstract misses too many good hits
    • 16. Challenges in tracking many studies• Tension between highly curating a manageable collection and minimally maintaining a broad collection• Too many publications for efficient collection by humans, so we must make it easy for scripts to do it reliably
    • 17. Challenges of completeness• Data use that is too difficult/costly to find cannot be counted• A selective sample, difficult to draw accurate conclusions in broad analyses of re-use
    • 18. Challenges in publishing practice, andlack of data management planning• Publishing sequence prevents citation creation before publication• Potential for change by educating the PI/mentor• Consciousness raising starting to occur due to funders’ requirements
    • 19. Poorly described and cited data+Excessive human search effort=Too costly, too questionable for confidentmeasure of impact
    • 20. Citing data with a DOI+Minimal human search effort=High hit accuracy for the cost, and betterconfidence of impact measures
    • 21. Finding data with simple search fields Integration with Web of Knowledge All Databases: Research data is equal to research literature
    • 22. Articles linked to underlying data.Increased data discovery.Reward for data citation.Potential for automated tracking.Converting journal searchinfrastructure to meet the needs ofdata, but synching metadata still awork in progress.
    • 23. Building a culture of viable data citationto improve measures of impact
    • 24. Provide PIs and users with citations andDOIs for all study-level data
    • 25. Join groups advocating viable dataciting practice
    • 26. Work with partner repositories tochange publishing practice
    • 27. Three meetings: Journal editors,domain repositories, and funders• Establish consistent data citation in social science journals• Encourage transparency in research• Optimize editorial work flows: sequencing• Develop common standards for repositories• Find long-term funding models repository sustainability
    • 28. Thank youElizabeth