Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Practical Data Management
ACRL DCIG Webinar
30 April 2014
Kristin Briney, PhD
andrius.v, https://www.flickr.com/photos/banditaz/6823875954 (CC BY-NC-SA)
Mr.TinDC, https://www.flickr.com/photos/mr_t_in_dc/5940438148 (CC BY-ND)
International Institute of Tropical Agriculture, https://www.flickr.com/photos/iita-media-library/8160877379 (CC BY-NC)
Mu...
Jen Doty and Rob O'Reilly, “Learning to Curate @ Emory”. RDAP 2014
Data Management Basics
• Introduction to a few topics in data
management
– File organization and naming
– Documentation
– ...
Data Management Basics
• Introduction to a few topics in data
management
– File organization and naming
– Documentation
– ...
For each minute of planning at
beginning of a project, you will save
10 minutes of headache later
FILE ORGANIZATION & NAMING
Dan Zen, http://www.flickr.com/photos/danzen/5551831155/ (CC BY)
File Organization
• What?
– Keeping your files in order
File Organization
• Why?
– Easier to find and use data
– Tell, at a glance, what is done and what you have
yet to do
– Can...
File Organization
• When?
– Always!
– Get in the habit of putting files in the right place
File Organization
• How?
– Any system is better than none
– Make your system logical for your data
• 80/20 Rule
– Possibil...
Example
• Thesis
– By chapter
• By file type (draft, figure, table, etc.)
• Data
– By researcher
• By analysis type
– By d...
File Naming Conventions
• What?
– Consistent naming for files
http://retractionwatch.com/2014/01/07/doing-the-right-thing-authors-retract-brain-paper-with-systematic-human-error-
in-co...
File Naming Conventions
• Why?
– Make it easier to find files
– Avoid duplicates
– Make it easier to wrap up a project bec...
File Naming Conventions
• When?
– For a group of related files (3 to 1000+)
– May need different conventions for different...
File Naming Conventions
• How?
– Pick what is most important for your name
• Date
• Site
• Analysis
• Sample
• Short descr...
File Naming Conventions
• How?
– Files should be named consistently
– Files names should be descriptive but short (<25
cha...
Example
• YYYYMMDD_site_sampleNum
– 20140422_PikeLake_03
– 20140424_EastLake_12
• Analysis-sample-concentration
– UVVis-st...
DOCUMENTATION
Brady, https://www.flickr.com/photos/freddyfromutah/4424199420 (CC BY)
What would someone unfamiliar
with your data need in order to find,
evaluate, understand, and reuse
them?
Documentation
• Why?
– Data without notes are unusable
– Because you won’t remember everything
– For others who may need t...
Documentation
• When?
– Always
– Documentation needs will vary between files
Documentation
• How?
– Take good notes
– Metadata schemas
• http://www.dcc.ac.uk/resources/metadata-standards
Documentation
• How?
– Methods
• Protocols
• Code
• Survey
• Codebook
• Data dictionary
• Anything that lets someone repro...
Documentation
• How?
– Templates
• Like structured metadata but easier
• Decide on a list of information before you collec...
Example
• I need to collect:
– Date
– Experiment
– Scan number
– Powers
– Wavelengths
– Concentration (or sample weight)
–...
Documentation
• How?
– README.txt
• For digital information, address the questions
– “What the heck am I looking at?”
– “W...
Example
• Project-wide README.txt
– Basic project information
• Title
• Contributors
• Grant info
• etc.
– Contact informa...
Example
“Talk_v1: rough outline of talk
Talk_v2: draft of talk
Talk_v3: updated 2014-01-15 after feedback”
“ ‘Data’ folder...
grover_net, http://www.flickr.com/photos/9246159@N06/599820538/ (CC BY-ND)
STORAGE AND BACKUPS
Storage
• Why?
– Need good storage practices to prevent loss
– Keep data secure
Storage
• How?
– Library motto: Lots of Copies Keeps Stuff Safe!
– Rule of 3: 2 onsite, 1 offsite
Storage
• How?
– Computer
– External hard drive
– Shared drives/servers
– Tape backup
– Cloud storage*
– CDs/DVDs
– USB fl...
*Cloud Storage
• Read the Terms of Service!
• Eg. Google Drive
– “When you upload or otherwise submit content to our Servi...
Backups
http://toystory.disney.com/
Backups
• How?
– Any backup is better than none
– Automatic backup is better than manual
– Your work is only as safe as yo...
Backups
• How?
– Check your backups
• Backups only as good as ability to recover data
• Test your backups periodically
– P...
Example
• I keep my data
– On my computer
– Backed up manually on shared drive
• I set a weekly reminder to do this
– Back...
FUTURE FILE USABILITY
Ian, http://www.flickr.com/photos/ian-s/2152798588/ (CC BY-NC-ND)
Future File Usability
• What?
– Can you read your files from 10 years ago?
– Data needs to be
• Accessible
• Interpretable...
lukasbenc, https://www.flickr.com/photos/lukasbenc/3493808772 (CC BY-NC-SA)
Future File Usability
• Why?
– You may want to use the data in 5 years
– PI sometimes keeps data and notes
– Prep for data...
Future File Usability
• When?
– When you wrap up a project
– (As you work on a project)
Future File Usability
• How?
– Back up written notes
• People always forget this one
• Difficult to interpret data without...
Future File Usability
• How?
– Convert file formats
• Can you open digital files from 10 years ago?
• Use open, non-propri...
Future File Usability
• How?
– Move to new media
• Hardware dies and becomes obsolete
– Floppy disks!
• Expect average lif...
WHERE TO GO FROM HERE
Center for Teaching Vanderbilt University, https://www.flickr.com/photos/vandycft/8244800868 (CC BY-NC)
easylocum, https://www.flickr.com/photos/easylocum/2921542814 (CC BY)
Chris Hoving, https://www.flickr.com/photos/pcrucifer/2433274595 (CC BY-ND)
Resources
• Data Ab Initio blog
– http://dataabinitio.com/
• eScience Portal
– http://esciencelibrary.umassmed.edu/
• Data...
Steal My Slides
• Slides + recording available
– http://connect.ala.org/node/220603
• Slides available
– http://www.slides...
Thank You!
• This presentation available under a Creative
Commons Attribution (CC-BY) license
• Some content courtesy of D...
Upcoming SlideShare
Loading in …5
×

Practical Data Management - ACRL DCIG Webinar

1,869 views

Published on

Slides from an ACRL DCIG webinar from 30 April 2014 discussing basic data management practices in file organization and naming, documentation, storage and backup, and making files usable in the future.

Published in: Data & Analytics, Technology
  • Be the first to comment

Practical Data Management - ACRL DCIG Webinar

  1. 1. Practical Data Management ACRL DCIG Webinar 30 April 2014 Kristin Briney, PhD
  2. 2. andrius.v, https://www.flickr.com/photos/banditaz/6823875954 (CC BY-NC-SA)
  3. 3. Mr.TinDC, https://www.flickr.com/photos/mr_t_in_dc/5940438148 (CC BY-ND)
  4. 4. International Institute of Tropical Agriculture, https://www.flickr.com/photos/iita-media-library/8160877379 (CC BY-NC) Musgo Dumio_Momio, https://www.flickr.com/photos/30976576@N07/2903662286 (CC BY-NC-SA)
  5. 5. Jen Doty and Rob O'Reilly, “Learning to Curate @ Emory”. RDAP 2014
  6. 6. Data Management Basics • Introduction to a few topics in data management – File organization and naming – Documentation – Storage and backups – Future file usability
  7. 7. Data Management Basics • Introduction to a few topics in data management – File organization and naming – Documentation – Storage and backups – Future file usability  Teach & Use
  8. 8. For each minute of planning at beginning of a project, you will save 10 minutes of headache later
  9. 9. FILE ORGANIZATION & NAMING Dan Zen, http://www.flickr.com/photos/danzen/5551831155/ (CC BY)
  10. 10. File Organization • What? – Keeping your files in order
  11. 11. File Organization • Why? – Easier to find and use data – Tell, at a glance, what is done and what you have yet to do – Can still find and use files in the future
  12. 12. File Organization • When? – Always! – Get in the habit of putting files in the right place
  13. 13. File Organization • How? – Any system is better than none – Make your system logical for your data • 80/20 Rule – Possibilities • By project • By analysis type • By date • …
  14. 14. Example • Thesis – By chapter • By file type (draft, figure, table, etc.) • Data – By researcher • By analysis type – By date
  15. 15. File Naming Conventions • What? – Consistent naming for files
  16. 16. http://retractionwatch.com/2014/01/07/doing-the-right-thing-authors-retract-brain-paper-with-systematic-human-error- in-coding/
  17. 17. File Naming Conventions • Why? – Make it easier to find files – Avoid duplicates – Make it easier to wrap up a project because you know which files belong to it
  18. 18. File Naming Conventions • When? – For a group of related files (3 to 1000+) – May need different conventions for different groups
  19. 19. File Naming Conventions • How? – Pick what is most important for your name • Date • Site • Analysis • Sample • Short description
  20. 20. File Naming Conventions • How? – Files should be named consistently – Files names should be descriptive but short (<25 characters) – Use underscores instead of spaces – Avoid these characters: “ / : * ? ‘ < > [ ] & $ – Use the dating convention: YYYY-MM-DD
  21. 21. Example • YYYYMMDD_site_sampleNum – 20140422_PikeLake_03 – 20140424_EastLake_12 • Analysis-sample-concentration – UVVis-stilbene-10mM – IR-benzene-pure
  22. 22. DOCUMENTATION Brady, https://www.flickr.com/photos/freddyfromutah/4424199420 (CC BY)
  23. 23. What would someone unfamiliar with your data need in order to find, evaluate, understand, and reuse them?
  24. 24. Documentation • Why? – Data without notes are unusable – Because you won’t remember everything – For others who may need to use your files
  25. 25. Documentation • When? – Always – Documentation needs will vary between files
  26. 26. Documentation • How? – Take good notes – Metadata schemas • http://www.dcc.ac.uk/resources/metadata-standards
  27. 27. Documentation • How? – Methods • Protocols • Code • Survey • Codebook • Data dictionary • Anything that lets someone reproduce your results
  28. 28. Documentation • How? – Templates • Like structured metadata but easier • Decide on a list of information before you collect data – Make sure you record all necessary details – Takes a few minutes upfront, easy to use later • Print and post in prominent place or use as worksheet
  29. 29. Example • I need to collect: – Date – Experiment – Scan number – Powers – Wavelengths – Concentration (or sample weight) – Calibration factors, like timing and beam size
  30. 30. Documentation • How? – README.txt • For digital information, address the questions – “What the heck am I looking at?” – “Where do I find X?” • Use for project description in main folder • Use to document conventions • Use where ever you need extra clarity
  31. 31. Example • Project-wide README.txt – Basic project information • Title • Contributors • Grant info • etc. – Contact information for at least one person – All locations where data live, including backups
  32. 32. Example “Talk_v1: rough outline of talk Talk_v2: draft of talk Talk_v3: updated 2014-01-15 after feedback” “ ‘Data’ folder contains all raw data files by date ‘Analysis’ has analyzed data and plots ‘Paper’ has drafts of article on this work”
  33. 33. grover_net, http://www.flickr.com/photos/9246159@N06/599820538/ (CC BY-ND) STORAGE AND BACKUPS
  34. 34. Storage • Why? – Need good storage practices to prevent loss – Keep data secure
  35. 35. Storage • How? – Library motto: Lots of Copies Keeps Stuff Safe! – Rule of 3: 2 onsite, 1 offsite
  36. 36. Storage • How? – Computer – External hard drive – Shared drives/servers – Tape backup – Cloud storage* – CDs/DVDs – USB flash drive Erica Wheelan, https://www.flickr.com/photos/reinventedwheel/5985479866 (CC BY)
  37. 37. *Cloud Storage • Read the Terms of Service! • Eg. Google Drive – “When you upload or otherwise submit content to our Services, you give Google (and those we work with) a worldwide license to use, host, store, reproduce, modify, create derivative works (such as those resulting from translations, adaptations or other changes we make so that your content works better with our Services), communicate, publish, publicly perform, publicly display and distribute such content. The rights you grant in this license are for the limited purpose of operating, promoting, and improving our Services, and to develop new ones”
  38. 38. Backups http://toystory.disney.com/
  39. 39. Backups • How? – Any backup is better than none – Automatic backup is better than manual – Your work is only as safe as your backup plan
  40. 40. Backups • How? – Check your backups • Backups only as good as ability to recover data • Test your backups periodically – Preferably a fixed schedule – 1 or 2 times a year may be enough – Bigger/more complex backups should be checked more often • Test your backup whenever you change things
  41. 41. Example • I keep my data – On my computer – Backed up manually on shared drive • I set a weekly reminder to do this – Backed up automatically via SpiderOak cloud storage
  42. 42. FUTURE FILE USABILITY Ian, http://www.flickr.com/photos/ian-s/2152798588/ (CC BY-NC-ND)
  43. 43. Future File Usability • What? – Can you read your files from 10 years ago? – Data needs to be • Accessible • Interpretable • Readable
  44. 44. lukasbenc, https://www.flickr.com/photos/lukasbenc/3493808772 (CC BY-NC-SA)
  45. 45. Future File Usability • Why? – You may want to use the data in 5 years – PI sometimes keeps data and notes – Prep for data sharing – Per OMB Circular A-110, must retain data at least 3 years post-project • Better to retain for >6 years
  46. 46. Future File Usability • When? – When you wrap up a project – (As you work on a project)
  47. 47. Future File Usability • How? – Back up written notes • People always forget this one • Difficult to interpret data without notes • Options – Digitally scan (recommended with digital data) – Photocopies
  48. 48. Future File Usability • How? – Convert file formats • Can you open digital files from 10 years ago? • Use open, non-proprietary formats that are in wide use – .docx  .txt – .xlsx  .csv – .jpg  .tif • Save a copy in the old format, just in case • Preserve software if no open file format
  49. 49. Future File Usability • How? – Move to new media • Hardware dies and becomes obsolete – Floppy disks! • Expect average lifetime to be 3-5 years • Keep up with technology
  50. 50. WHERE TO GO FROM HERE
  51. 51. Center for Teaching Vanderbilt University, https://www.flickr.com/photos/vandycft/8244800868 (CC BY-NC)
  52. 52. easylocum, https://www.flickr.com/photos/easylocum/2921542814 (CC BY)
  53. 53. Chris Hoving, https://www.flickr.com/photos/pcrucifer/2433274595 (CC BY-ND)
  54. 54. Resources • Data Ab Initio blog – http://dataabinitio.com/ • eScience Portal – http://esciencelibrary.umassmed.edu/ • DataONE Best Practices – http://www.dataone.org/best-practices
  55. 55. Steal My Slides • Slides + recording available – http://connect.ala.org/node/220603 • Slides available – http://www.slideshare.net/kbriney
  56. 56. Thank You! • This presentation available under a Creative Commons Attribution (CC-BY) license • Some content courtesy of Dorothea Salo – http://www.graduateschool.uwm.edu/research/resear cher-central/proposal-development/data-plan/boot- camp/ (CC BY)

×