Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Data Management for Undergraduate Researchers (updated - 02/2016)

224 views

Published on

Spring 2016 version of Office of Undergraduate Research workshop.

Published in: Education
  • Be the first to comment

  • Be the first to like this

Data Management for Undergraduate Researchers (updated - 02/2016)

  1. 1. Data Management for Undergraduate Researchers Office of Undergraduate Research Seminar and Workshop Series Rebekah Cummings, Research Data Management Librarian J. Willard Marriott Library, University of Utah February 23, 2016
  2. 2. • Introductions • What are data? • Why manage data? • Data Management Plans • Data Organization • Metadata • Storage and Archiving • Questions
  3. 3. Name MajorResearch Project
  4. 4. What is data management? The process of controlling the information (read: data) generated during a research project. https://www.libraries.psu.edu/psul/pubcur/what_is_dm.html
  5. 5. What are data? “The recorded factual material commonly accepted in the research community as necessary to validate research findings.” - U.S. OMB Circular A-110
  6. 6. Data are diverse
  7. 7. Data are messy
  8. 8. Why manage data? • Save time and efficiency • Meet grant requirements • Promote reproducible research • Enable new discoveries from your data • Make the results of publicly funded research publicly available
  9. 9. We are trying to avoid this scenario…
  10. 10. Two bears data management problems 1. Didn’t know where he stored the data 2. Saved one copy of the data on a USB drive 3. Data was in a format that could only be read by outdated, proprietary software 4. No codebook to explain the variable names 5. Variable names were not descriptive 6. No contact information for the co-author Sam Lee
  11. 11. Scenario You develop a research project during your undergraduate experience.You write up the results, which are accepted by a reputable journal. People start citing your work! Three years later someone accuses you of falsifying your work. Scenario adapted from MANTRA training module
  12. 12. • Would you be able to prove you did the work as you described in the article? • What would you need to prove you hadn’t falsified the data? • What should you have done throughout your research study to be able to prove you did the work as described?
  13. 13. Data Management Plans • What data are generated by your research? • What is your plan for managing the data? • How will your data be shared?
  14. 14. Elements of a DMP • Types of data, including file formats • Data description • Data storage • Data sharing, including confidentiality or security restrictions • Data archiving and responsibility • Data management costs
  15. 15. DMPTool – CDL
  16. 16. Data organization
  17. 17. File naming
  18. 18. MyData.xls MeetingNotes.doc Presentation.ppt Assignment1.pdf
  19. 19. File naming best practices 1. Be descriptive not generic 2. Appropriate length (about 25 chars or less) 3. Be consistent 4. Think critically about your file names
  20. 20. File naming best practices • Files should include only letters, numbers, and underscores/dashes. • No special characters • No spaces; Use dashes, underscores, or camel case (like-this or likeThis) • Avoid case dependency.Assume this, THIS, and tHiS are the same. • Have a strategy for version control. • Don’t overwrite file extensions
  21. 21. One potential strategy
  22. 22. Version Control - Numbering 001 002 003 009 010 099 Use leading zeros for scalability Bonus Tip: Use ordinal numbers (v1,v2,v3) for major version changes and decimals for minor changes (v1.1, v2.6) 1 10 2 3 9 99
  23. 23. Version Control - Dates If using dates useYYYYMMDD June2015 = BAD! 06-18-2015 = BAD! 20150618 = GREAT! 2015-06-18 = This is fine too 
  24. 24. From a DMP… “Each file name, for all types of data, will contain the project acronym PUCCUK; a reference to the file content (survey, interview, media) and the date of an event (such as the date of an interview).
  25. 25. • PLPP_EvaluationData_Workshop2_2014.xlsx • MyData.xlsx • publiclibrarypartnershipsprojectevaluationdataw orkshop22014CummingsHelenaMontana.xlsx Who filed better?
  26. 26. Who filed better? • July 24 2014_SoilSamples%_v6 • 20140724_NSF_SoilSamples_Cummings • SoilSamples_FINAL
  27. 27. Structuring folders and files • Consider all the types of files you will handle during the course of your project. • Develop a nested folder structure that makes sense for your project and your team’s retrieval needs. • Name folders clearly, without special characters (avoid redundancy) • Use a standard folder structure for each project or subproject (including making folders for files not yet created) • Create a reference document (README file) that notes the purpose of different folder. University of Massachusetts Medical School Library http://libraryguides.umassmed.edu/file_management
  28. 28. File organization exercise
  29. 29. Describing data
  30. 30. Research Documentation • Grant proposals and related reports • Applications and approvals (e.g. IRB) • Codebooks, data dictionaries • Consent forms • Surveys, questionnaires, interview protocols • Transcripts, hard copies of audio and video files • Any software or code you used (no matter how insignificant or buggy)
  31. 31. Three levels of documentation • Project level – what the study set out to do, research questions, methods, sampling frames, instruments, protocols, members of the research team • File or database level – How all the files relate to one another.A README file is a classic way of capturing this information. • Variable or item level – Full label explaining the meaning of each variable. http://datalib.edina.ac.uk/mantra/documentation_metadata_citation/
  32. 32. IJ? XVAR? FNAME?
  33. 33. What goes in a codebook? • Variable name • Variable meaning • Variable data types • Precision of data • Units • Known issues with the data • Relationships to other variables • Null values • Anything else someone needs to better understand the data
  34. 34. Metadata Unstructured Data Structured Data There was a study put out by Dr. Gary Bradshaw from the University of Nebraska Medical Center in 1982 called “ Growth of Rodent Kidney Cells in Serum Media and the Effect of Viral Transformation On Growth”. It concerns the cytology of kidney cells. Title Growth of rodent kidney cells in serum media and the effect of viral transformations on growth. Author Gary Bradshaw Date 1982 Publisher University of Nebraska Medical Center Subject Kidney -- Cytology
  35. 35. Data Storage
  36. 36. LOCKSS (Lots of Copies Keeps Stuff Safe)
  37. 37. Options for data storage • Personal computers or laptops • Networked drives • External storage devices
  38. 38. 3-2-1 Backup Rule Have 3 copies of your data On 2 different media In more than 1 physical location
  39. 39. Ubox – box.utah.edu
  40. 40. Language from a DMP “All data files will be stored on the University server that is backed up nightly.The University's computing network is protected from viruses by a firewall and anti-virus software. Digital recordings will be copied to the server each day after interviews. Signed consent forms will be stored in a locked cabinet in the office. Interview recordings and transcripts, which may contain personal information, will be password protected at file-level and stored on the server. Original versions of the files will always be kept on the server. If copies of files are held on a laptop and edits made, their file names will be changed.”
  41. 41. Thinking long- term
  42. 42. Archiving options • Domain-specific repository • General Purpose Data Repository • Institutional repository
  43. 43. When you archive… • Save the data in both its proprietary and non-proprietary format (e.g. Excel and CSV; Microsoft Word and ASCII) • Consider any restrictions on your data (copyright, patent, privacy, etc.) • When possible/mandated/desired, share your data online with a persistent identifier (DOI or ARK) • Include a data citation and state how you want to get credit for your data • Link your data to your publications as often as possible
  44. 44. Major takeaways • Data management starts at the beginning of a project • Document your data so that someone else could understand it • Have more than one copy of your data • Consider archiving options when you are done with your project
  45. 45. Questions? Rebekah Cummings rebekah.cummings@utah.edu (801) 581-7701 Marriott Library, 1705Y …or ask now!

×