Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Learning	
  to	
  Curate:	
  	
  
Lessons	
  from	
  an	
  ICPSR	
  Pilot	
  
	
  
	
  
	
  
	
  
	
  
Jared	
  Lyle	
  
R...
h3p://www.icpsr.umich.edu	
  
Background	
  
Data	
  Sharing	
  (N=935)	
  
Federal	
  
Agency	
  
Shared	
  
Formally,	
  
Archived	
  
(n=111)	
  
Shared	
  
Informa...
Vines et al. Current Biology 24, 94–97, January 6, 2014 	

http://dx.doi.org/10.1016/j.cub.2013.11.014	

Image: http://www...
What	
  is	
  CuraJon?	
  
A	
  well-­‐prepared	
  data	
  collecRon	
  
“contains	
  informaRon	
  intended	
  to	
  
be	
  complete	
  and	
  self-...
A	
  corollary:	
  Do	
  no	
  harm.	
  
http://img.gawkerassets.com/img/17xbuy519gga2jpg/ku-xlarge.jpg
CollaboraJve	
  CuraJon	
  
Partnerships
Green, Ann G., and Myron P. Gutmann. (2007) "Building
Partnerships Among Social Science  Researchers, Institu...
Support:
Ron Nakao, Stanford	

Libbie Stephenson, UCLA	

Jon Stiles, UC Berkeley	

Jen Doty, Emory	

Rob O’Reilly, Emory	

Joel Her...
Pilot	
  Goals	
  
For	
  parRcipants:	
  
•  Apply	
  curaRon	
  theories	
  to	
  pracRce	
  through	
  
actual	
  data	
  processing.	
  
...
For	
  ICPSR:	
  
•  Engage	
  with	
  outside	
  data	
  curators	
  to	
  learn	
  
what	
  others	
  are	
  doing	
  an...
Week	
  1	
  -­‐	
  IntroducRons	
  &	
  Data	
  Sources	
  
	
  	
  
Week	
  2	
  –	
  AcquisiRon	
  
	
  	
  
Week	
  3	...
The	
  Virtual	
  Data	
  Enclave	
  (VDE)	
  provides	
  remote	
  access	
  
to	
  quanJtaJve	
  data	
  in	
  a	
  secu...
Lessons	
  Learned	
  
Your	
  ideas	
  on	
  collaboraJve	
  curaJon?	
  
Thank	
  you!	
  
lyle@umich.edu	
  	
  
LEARNING TO CURATE
@ EMORY
Jen Doty and Rob O'Reilly
Reasons to Participate
¨  well-timed with new
RDM hires
¨  higher-up support for
involvement in RDM
projects
RDAP14	

Gr...
What's in it for us?
¨  learn from gold
standard holders:
¤  ICPSR processing
pipeline and tools
¤  implications of
pro...
The Data
RDAP14	

¨  Panel Data - all states in the United States,
1972-2007, annual
¨  Coded Data - state-level data po...
The Data
Issues and Considerations
RDAP14	

¨  Data assembled for particular project, not with
long-term archiving and research in...
Issues and Considerations, Cont.
RDAP14	

¨  Long history with the Principal Investigator for the
project, which meant lo...
Issues and Considerations, Cont.
RDAP14	

¨  Absent that prior history, the climb would have been
much more steep
Steep c...
Conclusions and Implications
¨  Overall: very
impressive to “see how
the sausage is made”
¤  ICPSR processing
pipeline
¤...
Conclusions and Implications, Cont.
¨  Realistically, providing
premium level of data
archiving service is not
possible w...
Work in Progress
RDAP14	

¨  Intent to archive dataset with ICPSR still holds, but
delayed by:
¤  necessity for further ...
Contact
RDAP14	

¨  Jen Doty – jennifer.doty@emory.edu
¨  Rob O’Reilly – roreill@emory.edu
Learning to Curate
@Duke	

Joel Herndon	

Data and GIS Services	

Duke Libraries
• Duke s Institutional Repository	

• Largely a home for scholarly publications
and dissertations	

• A few data collectio...
Presidential Donor Survey
RDAP14: Learning to Curate Panel
RDAP14: Learning to Curate Panel
RDAP14: Learning to Curate Panel
RDAP14: Learning to Curate Panel
RDAP14: Learning to Curate Panel
RDAP14: Learning to Curate Panel
RDAP14: Learning to Curate Panel
RDAP14: Learning to Curate Panel
RDAP14: Learning to Curate Panel
RDAP14: Learning to Curate Panel
RDAP14: Learning to Curate Panel
RDAP14: Learning to Curate Panel
RDAP14: Learning to Curate Panel
RDAP14: Learning to Curate Panel
RDAP14: Learning to Curate Panel
RDAP14: Learning to Curate Panel
RDAP14: Learning to Curate Panel
RDAP14: Learning to Curate Panel
RDAP14: Learning to Curate Panel
RDAP14: Learning to Curate Panel
RDAP14: Learning to Curate Panel
Upcoming SlideShare
Loading in …5
×

RDAP14: Learning to Curate Panel

1,183 views

Published on

Research Data Access and Preservation Summit, 2014
San Diego, CA
March 26-28, 2014

Jared Lyle, ICPSR
Jennifer Doty, Emory University
Joel Herndon, Duke University
Libbie Stephenson, University of California, Los Angeles

Published in: Education
  • Be the first to comment

  • Be the first to like this

RDAP14: Learning to Curate Panel

  1. 1. Learning  to  Curate:     Lessons  from  an  ICPSR  Pilot             Jared  Lyle   RDAP  2014  
  2. 2. h3p://www.icpsr.umich.edu  
  3. 3. Background  
  4. 4. Data  Sharing  (N=935)   Federal   Agency   Shared   Formally,   Archived   (n=111)   Shared   Informally,   Not   Archived   (n=415)   Not   Shared   (n=409)   NSF   (27.3%)   22.4%   43.7%   33.9%   NIH   (72.7%)   7.4%   45.0%   47.6%   Total   11.5%   44.6%   43.9%   Pienta, Alter, & Lyle (2010). “The Enduring Value of Social Science Research: The Use and Reuse of Primary Research Data”. http://hdl.handle.net/2027.42/78307
  5. 5. Vines et al. Current Biology 24, 94–97, January 6, 2014 http://dx.doi.org/10.1016/j.cub.2013.11.014 Image: http://www.peerreviewcongress.org/2013/Plenary-Session-Abstracts-9-9.pdf
  6. 6. What  is  CuraJon?  
  7. 7. A  well-­‐prepared  data  collecRon   “contains  informaRon  intended  to   be  complete  and  self-­‐explanatory”   for  future  users.  
  8. 8. A  corollary:  Do  no  harm.   http://img.gawkerassets.com/img/17xbuy519gga2jpg/ku-xlarge.jpg
  9. 9. CollaboraJve  CuraJon  
  10. 10. Partnerships Green, Ann G., and Myron P. Gutmann. (2007) "Building Partnerships Among Social Science  Researchers, Institution- based Repositories, and Domain Specific Data Archives."  OCLC Systems and Services: International Digital Library Perspectives. 23: 35-53.   http://hdl.handle.net/2027.42/41214 “We propose that domain specific archives partner with institution based repositories to provide expertise, tools, guidelines, and best practices to the research communities they serve.”
  11. 11. Support:
  12. 12. Ron Nakao, Stanford Libbie Stephenson, UCLA Jon Stiles, UC Berkeley Jen Doty, Emory Rob O’Reilly, Emory Joel Herndon, Duke
  13. 13. Pilot  Goals  
  14. 14. For  parRcipants:   •  Apply  curaRon  theories  to  pracRce  through   actual  data  processing.   •  Will  have  a  fully  curated  data  collecRon  ready   for  archiving  at  the  end  of  the  session.   •  Interact  with  and  ask  quesRons  of  other  data   specialists  within  a  working  environment.   •  Gain  first-­‐hand  experience  using  ICPSR’s  internal   tools  and  workflows  for  curaRon.   •  Understand  level  of  effort  to  work  through   collecRons  and  provide  assistance  to   researchers.   •  Learn  about  things  not  thought  about  (e.g.,   cosRng,  standardized  workflows).  
  15. 15. For  ICPSR:   •  Engage  with  outside  data  curators  to  learn   what  others  are  doing  and  thinking.   •  Polish  internal  procedures  and  tools  by   opening  them  to  outside  review  and  criRque.   •  More  data  will  be  curated  and  archived,   benefiRng  the  ICPSR  membership  and  the   enRre  social  science  community.   •  Be3er  uRlize  resources  of  the  OR  community,   including  personal  relaRonships  and,   especially,  their  wide-­‐ranging  experRse.   •  Train  a  data  curaRon  community  of  support    
  16. 16. Week  1  -­‐  IntroducRons  &  Data  Sources       Week  2  –  AcquisiRon       Week  3  -­‐  Review         Week  4  –  Processing       Week  5  –  Metadata       Week  6  –  DisseminaRon   Schedule  
  17. 17. The  Virtual  Data  Enclave  (VDE)  provides  remote  access   to  quanJtaJve  data  in  a  secure  environment.  
  18. 18. Lessons  Learned  
  19. 19. Your  ideas  on  collaboraJve  curaJon?  
  20. 20. Thank  you!   lyle@umich.edu    
  21. 21. LEARNING TO CURATE @ EMORY Jen Doty and Rob O'Reilly
  22. 22. Reasons to Participate ¨  well-timed with new RDM hires ¨  higher-up support for involvement in RDM projects RDAP14 Green Means Go! by Jack Mayer on Flickr / CC BY-NC-SA 2.0
  23. 23. What's in it for us? ¨  learn from gold standard holders: ¤  ICPSR processing pipeline and tools ¤  implications of providing premium level service for staffing and resource allocation RDAP14 Nobel Prize Illustration by Howdy, I’m H. Michael Karshis on Flickr / CC BY 2.0
  24. 24. The Data RDAP14 ¨  Panel Data - all states in the United States, 1972-2007, annual ¨  Coded Data - state-level data policies on home schooling, and relevant court cases ¨  Publicly-Available Data - a mix of demographic, economic, and social data from sources such as the BEA, the Census Bureau, the NCES ¨  No issues with regard to sensitivity of data or proprietary restrictions
  25. 25. The Data
  26. 26. Issues and Considerations RDAP14 ¨  Data assembled for particular project, not with long-term archiving and research in mind ¨  Discrepancies in documentation: ¤  variable names ¤  unclear citations ¤  broken URLs ¤  variables in data missing from codebook, and vice- versa
  27. 27. Issues and Considerations, Cont. RDAP14 ¨  Long history with the Principal Investigator for the project, which meant lots of context about the project and the data ¨  Useful in clarifying ambiguities in the data, e.g. “it makes sense to us” citations ¨  Even with that context, there was still much work and back-and-forth involved
  28. 28. Issues and Considerations, Cont. RDAP14 ¨  Absent that prior history, the climb would have been much more steep Steep climb up by lisa Angulo reid on Flickr / CC BY-NC 2.0
  29. 29. Conclusions and Implications ¨  Overall: very impressive to “see how the sausage is made” ¤  ICPSR processing pipeline ¤  Hermes ¤  SDE infrastructure RDAP14 Sausage machine by Scoobyfoo on Flickr / CC BY-NC-ND 2.0
  30. 30. Conclusions and Implications, Cont. ¨  Realistically, providing premium level of data archiving service is not possible with existing staffing levels and resources RDAP14 IBM 1620 in Computer Lab by euthman on Flickr / CC BY-SA
  31. 31. Work in Progress RDAP14 ¨  Intent to archive dataset with ICPSR still holds, but delayed by: ¤  necessity for further documentation from investigators ¤  demands on our time from other projects ¨  Future plans for archiving datasets created by campus researchers informed by lessons learned from participating in pilot project
  32. 32. Contact RDAP14 ¨  Jen Doty – jennifer.doty@emory.edu ¨  Rob O’Reilly – roreill@emory.edu
  33. 33. Learning to Curate @Duke Joel Herndon Data and GIS Services Duke Libraries
  34. 34. • Duke s Institutional Repository • Largely a home for scholarly publications and dissertations • A few data collections attached to papers, but limited research data
  35. 35. Presidential Donor Survey

×