Successfully reported this slideshow.
Your SlideShare is downloading. ×

Software as a Well-Formed Research Object


Check these out next

1 of 41 Ad

More Related Content

Viewers also liked (20)

Similar to Software as a Well-Formed Research Object (20)


More from Yasmin AlNoamany, PhD (13)

Recently uploaded (20)


Software as a Well-Formed Research Object

  1. 1. Software as a Well-Formed Research Object DLF 2017 Forum Pittsburgh, PA October 24, 2017 Yasmin AlNoamany, John Borghi, Alexandra Chassanoff, Katherine Thornton
  2. 2. Who we are 2 Yasmin AlNoamany, University of California, Berkeley John Borghi, California Digital Library Alex Chassanoff, MIT Libraries Katherine Thornton, Yale University Library
  3. 3. Background 1st cohort of Software Curation Postdoctoral Fellows at CLIR Spread across 2 coasts, 5 institutions Wide range of areas being explored 3
  4. 4. Software Curation: Conceptual Challenges What is software? 4
  5. 5. Software Curation: Conceptual Challenges What is curation? 5
  6. 6. Software Curation: Social Challenges Social Software in Scholarly Communications Software and Academic Incentives 6
  7. 7. Software Curation: Technical Challenges ● Identifying ○ execution environment ○ dependencies and integrated libraries ○ data ○ metadata ○ individual components ● Evolution ● Compatibility ● Migration 7 Image source:
  8. 8. Software Curation: Current Work Survey of Researcher Practices and Perceptions: UC Berkeley and California Digital Library 8
  9. 9. Software Curation: Current Work Research Questions 1. How are researchers using software? 2. How do researchers share their software? 3. What do researchers value about their software? Areas of interest 1. Software and reproducible research practices 2. Metrics for software
  10. 10. Software Curation: Current Work Background 1. Increasing agreement that software and research-related code are important scholarly products 2. Research into how research software is mentioned, cited 3. Surveys into practices and perceptions around other research products (e.g. Data)
  11. 11. Software Curation: Current Work Survey Design 1. Goal was to capture as broad a view of researcher practices and perceptions as possible. 2. 56 questions a. 53 Multiple Choice b. 3 Open Response
  12. 12. Software Curation: Current Work Distribution 1. Approved by UC Berkeley IRB 2. Distributed via Qualtrics Inclusion Criteria 1. Participant had to consent, be over the age of 18, and say that they use software during the course of their research 2. Participant had to complete at least the demographic section.
  13. 13. 215 researchers respondents
  14. 14. Software Practices in Scientific Research
  15. 15. Overview of Software Practices in Scientific Research
  16. 16. Use of Research Software
  17. 17. Open Source versus Commercial
  18. 18. Coding Languages and Purpose
  19. 19. Coding Languages and Purpose 55.7% of researchers selected all the five purposes 86.4% of all languages
  20. 20. Code Sharing Practices
  21. 21. Most of the time, researcher share source code via emails In what format do you typically share your code? How do you share your code?
  22. 22. 25 Some reasons: ● “Not elegant” ● “Licensing issues” ● “Time pressure, time it takes to tidy up and document code” ● “require 'cleanup' and better commenting”
  23. 23. Reproducibility Practices
  24. 24. CS researchers tend to provide information about dependencies more than other disciplines do you share related files (e.g. datasets) with your code? do you provide information about dependencies?
  25. 25. Preservation Practices
  26. 26. 76.2% of researchers uses Github for preserving their codes Where do you save your code or software so that it is preserved over the long term? How long do you typically save your code or software?
  27. 27. How do you use software or code in your research? “Software is the main driver of my research and development program. I use it for everything from exploratory data analysis, to writing papers. Most of my research activities include the writing of code specifically aimed at the implementation of particular analytic methods.” “I use code to document in a reproducible manner all steps of data analysis, from collecting data from where they are stored (databases, spreadsheets, plain text files, etc.) to preparing the final reports (i.e. a set of scripts can fully reproduce a report or manuscript given the raw data, with little human intervention).” 30
  28. 28. How do you define “sharing” and “preserving”? “I think of sharing code as making it publicly accessible, but not necessarily advertising it. I think of preserving code as depositing it somewhere remotely, where I can't accidentally delete it. I realize that GitHub should not be the end goal of code preservation, but as of yet I have not taken steps to preserve my code anywhere more permanently than GitHub.” “..."Sharing", to me, means that somebody else can discover and obtain the code, probably (but not necessarily) along with sufficient documentation to use it themselves. "Preserve" has stronger connotations. It implies a higher degree of documentation, both about the software itself, but also its history, requirements, dependencies, etc., and also feels more "official"- so my university's data repository feels more "preserve"-ish than my group's Github page.” 31
  29. 29. Conclusion ● Researchers consider software to be as important as data ● Most researchers do differentiate sharing from preservation, but they need tools and guidance on how to preserve their code ● Time and licenses are the main constraints of sharing software
  30. 30. Software Curation: Current Work MIT Libraries ● Iterative approach ● Consider software ● as an artifact with characteristics ● as a research process → Software as a scholarly object in a digital scholarship ecosystem 33
  31. 31. Software Curation: Current Work MIT Libraries ● Software Curation Profiles ● Software Intake Form 34
  32. 32. Software Curation: Current Work Strategic thinking for institutions ● Define communities of practice ● Identify boundaries for software as a scholarly object ● Identify preservation outcomes + curation activities ---------------------------------------------------------- ● Don’t Let Perfect Be the Enemy of Good 35
  33. 33. Software Curation: Current Work at Yale Legacy software in library collections CD-ROMs and floppy disks at risk of deterioration Library might not have relevant computing platform Cataloged according to principles of traditional MARC-based description 36
  34. 34. Emulation as a Service http://bw-fla.uni-freiburg. de/ Developed by Albert Ludwigs Universität Freiburg 37
  35. 35. EaaS and Wikidata 38
  36. 36. Wikidata for Digital Preservation Describing software, file formats, and configured environments in Wikidata Proposing necessary properties to extend data models 39
  37. 37. Thank you! Yasmin John Alex Katherine 40
  38. 38. References Introduction to Software Survey Software Preservation Network The Pathways of Research Software Preservation Metadata Standards Survey: Initial Results, Analysis, and Next Steps 41