Your SlideShare is downloading. ×
  • Like
Practical Data Management - ACRL DCIG Webinar
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Now you can save presentations on your phone or tablet

Available for both IPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Practical Data Management - ACRL DCIG Webinar

  • 476 views
Published

Slides from an ACRL DCIG webinar from 30 April 2014 discussing basic data management practices in file organization and naming, documentation, storage and backup, and making files usable in the …

Slides from an ACRL DCIG webinar from 30 April 2014 discussing basic data management practices in file organization and naming, documentation, storage and backup, and making files usable in the future.

Published in Data & Analytics , Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
476
On SlideShare
0
From Embeds
0
Number of Embeds
4

Actions

Shares
Downloads
7
Comments
0
Likes
3

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • I’m excited to be speaking today about practical data management because it is a topic near and dear to me. 5 years ago I worked in a place like this, when I was a chemistry researcher doing laser spectroscopy. My favorite part was working with my data, but it was also one of the more frustrating aspects of being a researcher. I had no training in data management, so I made things up (not always successfully). I also spent a year reproducing another person’s results and nothing shows just how inadequate most data practices are quite like working with someone else’s data.
  • Now, I focus on helping researchers with their data management at my current place of work, the University of Wisconsin-Milwaukee. This webinar, in fact, is based on the workshop I teach to my users.
  • But data management is not just for researchers. Librarians need to know these skills, in particular - those who want to curate research datasets. I’m really glad to be doing an ACRL DCIG (digital curation interest group) webinar because I think there is a strong correlation between data management and data curation.
  • The connection between data management and data curation was apparent at the recent Research Data Access and Preservation conference during the panel on “learning to curate”. This slide from the Emory group sums up the issue nicely in that the major challenges with curating research datasets are not preservation issues but rather data management issues. So if we want to easily curate research datasets, we need to work with researchers on data management so that data comes to us in a form that can be easily curated. Plus, data management is a skill that most researchers need, allowing us to provide a direct benefit to researchers while furthering our curation goals.
  • Consistent and correct naming schemes are important, as evidenced by this recent retraction for “error in coding”. Mislabelling meant that the analysis was done on the wrong samples, affecting the results of the paper. So naming is very important.
  • So many [lack of] backup horror stories. Toy Story 2 has one of the best ones. See video: https://www.youtube.com/watch?v=EL_g0tyaIeE&feature=player_detailpage
  • This one has affected me personally because I no longer have access to my PhD data, even though it is <5 years old. The reason is that my files are locked up in a proprietary format and I don’t have access to the necessary software after I left the lab. If I had done a little work ahead of time, I wouldn’t be in this position.
  • I encourage you to teach and share these data management strategies with your users. My slide are available under a CC-BY license, so feel free to modify and reuse.
  • Also, dive into these practices for yourself. They will help you manage your own data.
  • Remember that good data management is the accumulation of many small practices. The best way to improve your practices is to make one small change at a time. Any small improvement makes it easier to work with your data. I challenge you to take one of the practices outlined in this talk and adopt it to improve your digital file practices.

Transcript

  • 1. Practical Data Management ACRL DCIG Webinar 30 April 2014 Kristin Briney, PhD
  • 2. andrius.v, https://www.flickr.com/photos/banditaz/6823875954 (CC BY-NC-SA)
  • 3. Mr.TinDC, https://www.flickr.com/photos/mr_t_in_dc/5940438148 (CC BY-ND)
  • 4. International Institute of Tropical Agriculture, https://www.flickr.com/photos/iita-media-library/8160877379 (CC BY-NC) Musgo Dumio_Momio, https://www.flickr.com/photos/30976576@N07/2903662286 (CC BY-NC-SA)
  • 5. Jen Doty and Rob O'Reilly, “Learning to Curate @ Emory”. RDAP 2014
  • 6. Data Management Basics • Introduction to a few topics in data management – File organization and naming – Documentation – Storage and backups – Future file usability
  • 7. Data Management Basics • Introduction to a few topics in data management – File organization and naming – Documentation – Storage and backups – Future file usability  Teach & Use
  • 8. For each minute of planning at beginning of a project, you will save 10 minutes of headache later
  • 9. FILE ORGANIZATION & NAMING Dan Zen, http://www.flickr.com/photos/danzen/5551831155/ (CC BY)
  • 10. File Organization • What? – Keeping your files in order
  • 11. File Organization • Why? – Easier to find and use data – Tell, at a glance, what is done and what you have yet to do – Can still find and use files in the future
  • 12. File Organization • When? – Always! – Get in the habit of putting files in the right place
  • 13. File Organization • How? – Any system is better than none – Make your system logical for your data • 80/20 Rule – Possibilities • By project • By analysis type • By date • …
  • 14. Example • Thesis – By chapter • By file type (draft, figure, table, etc.) • Data – By researcher • By analysis type – By date
  • 15. File Naming Conventions • What? – Consistent naming for files
  • 16. http://retractionwatch.com/2014/01/07/doing-the-right-thing-authors-retract-brain-paper-with-systematic-human-error- in-coding/
  • 17. File Naming Conventions • Why? – Make it easier to find files – Avoid duplicates – Make it easier to wrap up a project because you know which files belong to it
  • 18. File Naming Conventions • When? – For a group of related files (3 to 1000+) – May need different conventions for different groups
  • 19. File Naming Conventions • How? – Pick what is most important for your name • Date • Site • Analysis • Sample • Short description
  • 20. File Naming Conventions • How? – Files should be named consistently – Files names should be descriptive but short (<25 characters) – Use underscores instead of spaces – Avoid these characters: “ / : * ? ‘ < > [ ] & $ – Use the dating convention: YYYY-MM-DD
  • 21. Example • YYYYMMDD_site_sampleNum – 20140422_PikeLake_03 – 20140424_EastLake_12 • Analysis-sample-concentration – UVVis-stilbene-10mM – IR-benzene-pure
  • 22. DOCUMENTATION Brady, https://www.flickr.com/photos/freddyfromutah/4424199420 (CC BY)
  • 23. What would someone unfamiliar with your data need in order to find, evaluate, understand, and reuse them?
  • 24. Documentation • Why? – Data without notes are unusable – Because you won’t remember everything – For others who may need to use your files
  • 25. Documentation • When? – Always – Documentation needs will vary between files
  • 26. Documentation • How? – Take good notes – Metadata schemas • http://www.dcc.ac.uk/resources/metadata-standards
  • 27. Documentation • How? – Methods • Protocols • Code • Survey • Codebook • Data dictionary • Anything that lets someone reproduce your results
  • 28. Documentation • How? – Templates • Like structured metadata but easier • Decide on a list of information before you collect data – Make sure you record all necessary details – Takes a few minutes upfront, easy to use later • Print and post in prominent place or use as worksheet
  • 29. Example • I need to collect: – Date – Experiment – Scan number – Powers – Wavelengths – Concentration (or sample weight) – Calibration factors, like timing and beam size
  • 30. Documentation • How? – README.txt • For digital information, address the questions – “What the heck am I looking at?” – “Where do I find X?” • Use for project description in main folder • Use to document conventions • Use where ever you need extra clarity
  • 31. Example • Project-wide README.txt – Basic project information • Title • Contributors • Grant info • etc. – Contact information for at least one person – All locations where data live, including backups
  • 32. Example “Talk_v1: rough outline of talk Talk_v2: draft of talk Talk_v3: updated 2014-01-15 after feedback” “ ‘Data’ folder contains all raw data files by date ‘Analysis’ has analyzed data and plots ‘Paper’ has drafts of article on this work”
  • 33. grover_net, http://www.flickr.com/photos/9246159@N06/599820538/ (CC BY-ND) STORAGE AND BACKUPS
  • 34. Storage • Why? – Need good storage practices to prevent loss – Keep data secure
  • 35. Storage • How? – Library motto: Lots of Copies Keeps Stuff Safe! – Rule of 3: 2 onsite, 1 offsite
  • 36. Storage • How? – Computer – External hard drive – Shared drives/servers – Tape backup – Cloud storage* – CDs/DVDs – USB flash drive Erica Wheelan, https://www.flickr.com/photos/reinventedwheel/5985479866 (CC BY)
  • 37. *Cloud Storage • Read the Terms of Service! • Eg. Google Drive – “When you upload or otherwise submit content to our Services, you give Google (and those we work with) a worldwide license to use, host, store, reproduce, modify, create derivative works (such as those resulting from translations, adaptations or other changes we make so that your content works better with our Services), communicate, publish, publicly perform, publicly display and distribute such content. The rights you grant in this license are for the limited purpose of operating, promoting, and improving our Services, and to develop new ones”
  • 38. Backups http://toystory.disney.com/
  • 39. Backups • How? – Any backup is better than none – Automatic backup is better than manual – Your work is only as safe as your backup plan
  • 40. Backups • How? – Check your backups • Backups only as good as ability to recover data • Test your backups periodically – Preferably a fixed schedule – 1 or 2 times a year may be enough – Bigger/more complex backups should be checked more often • Test your backup whenever you change things
  • 41. Example • I keep my data – On my computer – Backed up manually on shared drive • I set a weekly reminder to do this – Backed up automatically via SpiderOak cloud storage
  • 42. FUTURE FILE USABILITY Ian, http://www.flickr.com/photos/ian-s/2152798588/ (CC BY-NC-ND)
  • 43. Future File Usability • What? – Can you read your files from 10 years ago? – Data needs to be • Accessible • Interpretable • Readable
  • 44. lukasbenc, https://www.flickr.com/photos/lukasbenc/3493808772 (CC BY-NC-SA)
  • 45. Future File Usability • Why? – You may want to use the data in 5 years – PI sometimes keeps data and notes – Prep for data sharing – Per OMB Circular A-110, must retain data at least 3 years post-project • Better to retain for >6 years
  • 46. Future File Usability • When? – When you wrap up a project – (As you work on a project)
  • 47. Future File Usability • How? – Back up written notes • People always forget this one • Difficult to interpret data without notes • Options – Digitally scan (recommended with digital data) – Photocopies
  • 48. Future File Usability • How? – Convert file formats • Can you open digital files from 10 years ago? • Use open, non-proprietary formats that are in wide use – .docx  .txt – .xlsx  .csv – .jpg  .tif • Save a copy in the old format, just in case • Preserve software if no open file format
  • 49. Future File Usability • How? – Move to new media • Hardware dies and becomes obsolete – Floppy disks! • Expect average lifetime to be 3-5 years • Keep up with technology
  • 50. WHERE TO GO FROM HERE
  • 51. Center for Teaching Vanderbilt University, https://www.flickr.com/photos/vandycft/8244800868 (CC BY-NC)
  • 52. easylocum, https://www.flickr.com/photos/easylocum/2921542814 (CC BY)
  • 53. Chris Hoving, https://www.flickr.com/photos/pcrucifer/2433274595 (CC BY-ND)
  • 54. Resources • Data Ab Initio blog – http://dataabinitio.com/ • eScience Portal – http://esciencelibrary.umassmed.edu/ • DataONE Best Practices – http://www.dataone.org/best-practices
  • 55. Steal My Slides • Slides + recording available – http://connect.ala.org/node/220603 • Slides available – http://www.slideshare.net/kbriney
  • 56. Thank You! • This presentation available under a Creative Commons Attribution (CC-BY) license • Some content courtesy of Dorothea Salo – http://www.graduateschool.uwm.edu/research/resear cher-central/proposal-development/data-plan/boot- camp/ (CC BY)