Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

RDAP14 Poster: Ashley Sands Who will manage scientific research data?


Published on

Research Data Access & Preservation Summit
March 26-28, 2014
San Diego, CA

Who will manage scientific research data?
Ashley Sands, University of California, Los Angeles

Published in: Education
  • Be the first to comment

  • Be the first to like this

RDAP14 Poster: Ashley Sands Who will manage scientific research data?

  1. 1. Who will manage scientific research data? A study of the emerging astronomy and library data management workforces Ashley Sands UCLA, Department of Information Studies This study examines how scientists, librarians, and other data managers address the following questions: What scientific research data should be managed? What is research data management? What knowledge and expertise are significant to manage research data? Specifically, how and why are the SDSS and LSST data curated and preserved? This research is funded by the U.S. National Science Foundation (“Data Conservancy” OCI0830976, S. Choudhury, PI, Johns Hopkins University, and “Knowledge & Data Transfer: the Formation of a New Workforce” #1145888. C.L. Borgman, PI; S. Traweek, Co-PI) and the Alfred P. Sloan Foundation (“The Transformation of Knowledge, Culture, and Practice in Data-Driven Science: A Knowledge Infrastructures Perspective” #20113194. C.L. Borgman, PI; S. Traweek, Co-PI). Thank you to the members of the UCLA Knowledge Infrastructures Team, which include: Christine L. Borgman, Peter T. Darch, Sharon Traweek, and Jillian C. Wallis. @UCLA_KI SDSS Image: This is the globular cluster Palomar 5, which is a cluster of stars orbiting the Milky Way at a distance of 210 thousand light years. Most of the fainter stars in the picture belong the to cluster; the brighter stars are foreground stars elsewhere in the Milky Way. This study builds on existing interviews, ethnographic participant observation, and document analysis conducted since Fall 2011. Seven weeks of participant observation and 35 interviews with key individuals in the SDSS collaboration have already been performed. Future work will be conducted with LSST collaboration members. Semi-structured interviews are the primary form of data collection. Interviews include questions revised from existing team protocols that engage researchers on their data practices, understanding of data management, archival, and preservation activities, and how they perceive their data management knowledge and expertise needs. Participant observation complements interviews to examine if there are important distinctions between what the interviewees express formally and the observed daily practices. The collected transcripts, field notes, and documents are analyzed as to how and why practices differ between and amongst communities. Preliminary Findings Research Methods Examination of the SDSS data transfer has revealed difficulties involved in large-scale data stewardship and the importance of domain knowledge as librarians and others manage scientific data. Findings show that what it means to curate and preserve the SDSS data differ between communities. The two libraries charged with managing the dataset went about the task differently. This study shows that the definition of the dataset and what it means to curate the data are not agreed upon within or amongst the workforces. The two libraries tasked with caring for the SDSS data were made up of distinct staff members, each with individual educational and experiential backgrounds. For example, many of the first library’s staff members hold professional librarian masters degrees. On the other hand, none of the staff charged with managing the data at the second library hold the degree, and instead have more experience in information technology and software development. The education and past work experiences of the staff at the two libraries led to distinct modes of operation when it came to how to prioritize caring for the SDSS data. The two distinct workforces brought about divergent ways of managing the dataset. A large number of people, with diverse educations and work experiences, are involved in SDSS data management: domain scientists, computer scientists, software and systems engineers, programmers, librarians, and archivists. The SDSS data management does not fall under the purview of any one kind of expertise. Instead, a variety of backgrounds and educational histories emerge from the workforce. The SDSS case study demonstrates that effective data management encompasses multiple tasks, requiring composite types of expertise, and emerging from multiple workforces. Preliminary Conclusions Research Questions The Sloan Digital Sky Survey (SDSS) is one of the most ground-breaking surveys in the history of astronomy. The survey covered over a quarter of the night sky with high quality optical and spectroscopic imaging. The first phase of the SDSS project (SDSS-I) ran from 2000-2005, the second (SDSS-II) from 2005-2008, and subsequent related projects continue today. The SDSS data are openly available to astronomers and the general public through data releases. The SDSS-I/II collection constitutes 130-180 terabytes of astronomical observations. After finishing data collection, four formal agreements (Memoranda of Understanding) established how the collection would be cared for over the next five years, concluding January 2014. The data was transferred from a national laboratory to two different university libraries; it was moved from one kind of workforce to two others. The Sloan Digital Sky Survey This project investigates astronomy data practices in three communities: The Sloan Digital Sky Survey (SDSS) collaboration, The Large Synoptic Survey Telescope (LSST) collaboration, and The library and archive workforces partnered with these and other astronomy collaborations. The study population includes astronomy faculty, students, and staff; library and archive staff; computer science staff; software engineers and programmers; and administrators from the nationally distributed institutions involved in the SDSS and LSST. Research Sites and Population