Your SlideShare is downloading. ×
RDAP14: Learning to Curate Panel
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

RDAP14: Learning to Curate Panel

639
views

Published on

Research Data Access and Preservation Summit, 2014 …

Research Data Access and Preservation Summit, 2014
San Diego, CA
March 26-28, 2014

Jared Lyle, ICPSR
Jennifer Doty, Emory University
Joel Herndon, Duke University
Libbie Stephenson, University of California, Los Angeles

Published in: Education

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
639
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
11
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Learning  to  Curate:     Lessons  from  an  ICPSR  Pilot             Jared  Lyle   RDAP  2014  
  • 2. h3p://www.icpsr.umich.edu  
  • 3. Background  
  • 4. Data  Sharing  (N=935)   Federal   Agency   Shared   Formally,   Archived   (n=111)   Shared   Informally,   Not   Archived   (n=415)   Not   Shared   (n=409)   NSF   (27.3%)   22.4%   43.7%   33.9%   NIH   (72.7%)   7.4%   45.0%   47.6%   Total   11.5%   44.6%   43.9%   Pienta, Alter, & Lyle (2010). “The Enduring Value of Social Science Research: The Use and Reuse of Primary Research Data”. http://hdl.handle.net/2027.42/78307
  • 5. Vines et al. Current Biology 24, 94–97, January 6, 2014 http://dx.doi.org/10.1016/j.cub.2013.11.014 Image: http://www.peerreviewcongress.org/2013/Plenary-Session-Abstracts-9-9.pdf
  • 6. What  is  CuraJon?  
  • 7. A  well-­‐prepared  data  collecRon   “contains  informaRon  intended  to   be  complete  and  self-­‐explanatory”   for  future  users.  
  • 8. A  corollary:  Do  no  harm.   http://img.gawkerassets.com/img/17xbuy519gga2jpg/ku-xlarge.jpg
  • 9. CollaboraJve  CuraJon  
  • 10. Partnerships Green, Ann G., and Myron P. Gutmann. (2007) "Building Partnerships Among Social Science  Researchers, Institution- based Repositories, and Domain Specific Data Archives."  OCLC Systems and Services: International Digital Library Perspectives. 23: 35-53.   http://hdl.handle.net/2027.42/41214 “We propose that domain specific archives partner with institution based repositories to provide expertise, tools, guidelines, and best practices to the research communities they serve.”
  • 11. Support:
  • 12. Ron Nakao, Stanford Libbie Stephenson, UCLA Jon Stiles, UC Berkeley Jen Doty, Emory Rob O’Reilly, Emory Joel Herndon, Duke
  • 13. Pilot  Goals  
  • 14. For  parRcipants:   •  Apply  curaRon  theories  to  pracRce  through   actual  data  processing.   •  Will  have  a  fully  curated  data  collecRon  ready   for  archiving  at  the  end  of  the  session.   •  Interact  with  and  ask  quesRons  of  other  data   specialists  within  a  working  environment.   •  Gain  first-­‐hand  experience  using  ICPSR’s  internal   tools  and  workflows  for  curaRon.   •  Understand  level  of  effort  to  work  through   collecRons  and  provide  assistance  to   researchers.   •  Learn  about  things  not  thought  about  (e.g.,   cosRng,  standardized  workflows).  
  • 15. For  ICPSR:   •  Engage  with  outside  data  curators  to  learn   what  others  are  doing  and  thinking.   •  Polish  internal  procedures  and  tools  by   opening  them  to  outside  review  and  criRque.   •  More  data  will  be  curated  and  archived,   benefiRng  the  ICPSR  membership  and  the   enRre  social  science  community.   •  Be3er  uRlize  resources  of  the  OR  community,   including  personal  relaRonships  and,   especially,  their  wide-­‐ranging  experRse.   •  Train  a  data  curaRon  community  of  support    
  • 16. Week  1  -­‐  IntroducRons  &  Data  Sources       Week  2  –  AcquisiRon       Week  3  -­‐  Review         Week  4  –  Processing       Week  5  –  Metadata       Week  6  –  DisseminaRon   Schedule  
  • 17. The  Virtual  Data  Enclave  (VDE)  provides  remote  access   to  quanJtaJve  data  in  a  secure  environment.  
  • 18. Lessons  Learned  
  • 19. Your  ideas  on  collaboraJve  curaJon?  
  • 20. Thank  you!   lyle@umich.edu    
  • 21. LEARNING TO CURATE @ EMORY Jen Doty and Rob O'Reilly
  • 22. Reasons to Participate ¨  well-timed with new RDM hires ¨  higher-up support for involvement in RDM projects RDAP14 Green Means Go! by Jack Mayer on Flickr / CC BY-NC-SA 2.0
  • 23. What's in it for us? ¨  learn from gold standard holders: ¤  ICPSR processing pipeline and tools ¤  implications of providing premium level service for staffing and resource allocation RDAP14 Nobel Prize Illustration by Howdy, I’m H. Michael Karshis on Flickr / CC BY 2.0
  • 24. The Data RDAP14 ¨  Panel Data - all states in the United States, 1972-2007, annual ¨  Coded Data - state-level data policies on home schooling, and relevant court cases ¨  Publicly-Available Data - a mix of demographic, economic, and social data from sources such as the BEA, the Census Bureau, the NCES ¨  No issues with regard to sensitivity of data or proprietary restrictions
  • 25. The Data
  • 26. Issues and Considerations RDAP14 ¨  Data assembled for particular project, not with long-term archiving and research in mind ¨  Discrepancies in documentation: ¤  variable names ¤  unclear citations ¤  broken URLs ¤  variables in data missing from codebook, and vice- versa
  • 27. Issues and Considerations, Cont. RDAP14 ¨  Long history with the Principal Investigator for the project, which meant lots of context about the project and the data ¨  Useful in clarifying ambiguities in the data, e.g. “it makes sense to us” citations ¨  Even with that context, there was still much work and back-and-forth involved
  • 28. Issues and Considerations, Cont. RDAP14 ¨  Absent that prior history, the climb would have been much more steep Steep climb up by lisa Angulo reid on Flickr / CC BY-NC 2.0
  • 29. Conclusions and Implications ¨  Overall: very impressive to “see how the sausage is made” ¤  ICPSR processing pipeline ¤  Hermes ¤  SDE infrastructure RDAP14 Sausage machine by Scoobyfoo on Flickr / CC BY-NC-ND 2.0
  • 30. Conclusions and Implications, Cont. ¨  Realistically, providing premium level of data archiving service is not possible with existing staffing levels and resources RDAP14 IBM 1620 in Computer Lab by euthman on Flickr / CC BY-SA
  • 31. Work in Progress RDAP14 ¨  Intent to archive dataset with ICPSR still holds, but delayed by: ¤  necessity for further documentation from investigators ¤  demands on our time from other projects ¨  Future plans for archiving datasets created by campus researchers informed by lessons learned from participating in pilot project
  • 32. Contact RDAP14 ¨  Jen Doty – jennifer.doty@emory.edu ¨  Rob O’Reilly – roreill@emory.edu
  • 33. Learning to Curate @Duke Joel Herndon Data and GIS Services Duke Libraries
  • 34. • Duke s Institutional Repository • Largely a home for scholarly publications and dissertations • A few data collections attached to papers, but limited research data
  • 35. Presidential Donor Survey

×