Data Storage & Preservation
Luke Bluma | Brianna Marshall | Elliott Shuppy
IGERT workshop | November 2014
STORAGE
Outline
• Problem with Storage
• Storage vs Backup
• Storage Types
• UW-Madison Options
• Personal Options
• Best Practice...
The Problem with Storage
• It’s everywhere!
• All the options seem similar
but slightly different
• Every use case is a li...
Storage vs Backup
Storage
Your working files. The files you access regularly and
change frequently. You need to store data...
Storage vs Backup
Backup
A frequent and regular process of copying your data to a
secure place that is separate from where...
Rule of 3
• Keep THREE copies of your data
– TWO onsite
– ONE offsite
• Example:
– One: Network Drive
– Two: External Hard...
Storage Types
• Local storage
– Hard drive, external hard
drive, thumb drive, etc.
• Network storage
– Private cloud, publ...
UW Data - Storage Options
• Local Storage/Backup Options
– External Hard Drive (TechStore)
• Local IT Options
– Services a...
UW Data – DoIT Options
• Storage: File and Block Storage
– File: easy to access, manage and share with
other UW folks
– Bl...
Personal Data - Storage Options
• Personal Data
– Your personal UW data: UW’s Box Account
– Your personal data: thumb driv...
Evaluating Cloud Services
• Lots of options out there – and not all are
created equal
• Read the Terms of Service!
• Serve...
Storage & Backup Best Practices
• Think about and plan your data management
strategy before storing data
• If the data has...
Storage & Backup Best Practices
• Put in the appropriate security measures
• Version control can be important especially
w...
Use Case 1 – Starting Fresh
• If you have a local IT person, contact them
first to talk about services available
• Contact...
Use Case 2 – Leaving UW
• UW Data
– If you have a local IT person, contact them
– If someone will be taking over your work...
Key Takeaways
• Figure out your storage requirements
– High security? Remote access? Ease of use?
Scalability?
• Ask aroun...
PRESERVATION
Storage & Backup
vs. Preservation
Storage & Backup = short-term
– Working copies
– Expected to change
Preservation = long-...
Thinking Long-Term
• The data you’ve carefully stored is only useful if
it’s readable and understandable
• Many factors af...
Thinking Long-Term
• None of the concepts discussed during this
workshop exist in a vacuum
• Some aspects of preservation ...
Time to Ponder
• Can you still access your data from…
– 20 years ago?
– 10 years ago?
– 5 years ago?
– 1 year ago?
Let’s t...
Unreadable Data
CULPRITS
• Obsolete media
• Obsolete software &
file formats
• Obsolete hardware
CC image by Flickr user w...
Unreadable Data: Solutions
Now
- Start researching. (Google!) Odds are someone else
has faced the same issue.
- Digital fo...
Unreadable Data: Solutions
Moving forward
• Today’s popular software can become obsolete through
business deals, new versi...
Lost Data
Now
• Do a data inventory. List all the places where your
data lives (both physical and digital)
• Plan for cons...
Decontextualized Data
Coded SPSS
survey
responses
(Useless without
the original
questionnaires)
Decontextualized Data: Solutions
Now
• Write contextual information in the form of a readme
file and/or scan written notes...
Repositories
Disciplinary repositories provide a good home
for data, often with the requirement that you
share it openly.
...
Databib & re3data
Plan to merge their two projects into one service by the end of 2015.
Institutional Help with Preservation
• IR not yet up to task of managing data… but
that’s in the works.
• UW Libraries is ...
Final Thoughts
• Preservation = thinking about how your data
organization, metadata, and storage impacts
your ability to a...
Contact Us
• Research Data Services (RDS)
– http://researchdata.wisc.edu/help/about-us/
• DoIT Storage and Backup
– cci@ci...
Questions?
Upcoming SlideShare
Loading in …5
×

Data Storage & Preservation

340 views

Published on

Presentation given by Luke Bluma, Brianna Marshall, and Elliott Shuppy during the IGERT workshop. November 2014.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
340
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Data Storage & Preservation

  1. 1. Data Storage & Preservation Luke Bluma | Brianna Marshall | Elliott Shuppy IGERT workshop | November 2014
  2. 2. STORAGE
  3. 3. Outline • Problem with Storage • Storage vs Backup • Storage Types • UW-Madison Options • Personal Options • Best Practices • Use Cases • Key Takeaways
  4. 4. The Problem with Storage • It’s everywhere! • All the options seem similar but slightly different • Every use case is a little different
  5. 5. Storage vs Backup Storage Your working files. The files you access regularly and change frequently. You need to store data safely and securely but you also need to have access to it. In general, losing your storage means losing current versions of the data.
  6. 6. Storage vs Backup Backup A frequent and regular process of copying your data to a secure place that is separate from where you keep your storage. Backup can be overlooked because you don’t really need it until you lose data, but when you need to restore a file it can be the most important process you have in place.
  7. 7. Rule of 3 • Keep THREE copies of your data – TWO onsite – ONE offsite • Example: – One: Network Drive – Two: External Hard Drive – Three: Cloud Storage • This ensures that your storage and backup is not all in the same place – that’s too risky!
  8. 8. Storage Types • Local storage – Hard drive, external hard drive, thumb drive, etc. • Network storage – Private cloud, public cloud, etc. • Private Cloud = network storage run by UW • Public Cloud = network storage run by vendor
  9. 9. UW Data - Storage Options • Local Storage/Backup Options – External Hard Drive (TechStore) • Local IT Options – Services available depends on your local IT department • DoIT Options – Storage: File and Block Storage – Backup: Bucky Backup Lite • Cloud Options – UW’s Box Account
  10. 10. UW Data – DoIT Options • Storage: File and Block Storage – File: easy to access, manage and share with other UW folks – Block: additional raw storage available over the network for your server • Backup: Bucky Backup Lite – Client runs on your computer or server and does incremental backups nightly – You can manage the retention policy and version control • Cloud Storage – UW’s Box Account
  11. 11. Personal Data - Storage Options • Personal Data – Your personal UW data: UW’s Box Account – Your personal data: thumb drive, external hard drive, or cloud options like Box, Crashplan, Dropbox, etc. • Discount with Crash Plan – 30% off - http://go.wisc.edu/crashplan
  12. 12. Evaluating Cloud Services • Lots of options out there – and not all are created equal • Read the Terms of Service! • Servers get hacked all the time. Whatever you’re storing, you don’t want your provider to have access to it. • Data encryption is your friend.
  13. 13. Storage & Backup Best Practices • Think about and plan your data management strategy before storing data • If the data has ANY value to you, back it up • If you have questions, ask for help! Local IT, RDS, peers, friends, etc. • Network storage is great, but think about having a plan in place if you need to access the data and the network is down
  14. 14. Storage & Backup Best Practices • Put in the appropriate security measures • Version control can be important especially when sharing data – plan ahead • Document who has access to the data and audit that on a regular basis • Test your backups – make sure they are working and you can actually restore a file • If you use cloud storage, think about an exit strategy
  15. 15. Use Case 1 – Starting Fresh • If you have a local IT person, contact them first to talk about services available • Contact RDS about a data management plan • If local IT doesn’t have service offerings, contact DoIT • If all else fails – at least plan out your data management strategy (storage, backup, etc.) before starting to collect/use data
  16. 16. Use Case 2 – Leaving UW • UW Data – If you have a local IT person, contact them – If someone will be taking over your work, give them access to a shared space like Box – If you are using DoIT services, make sure someone else still on campus has access to the data – If you don’t have local IT, and aren’t using shared services but think the data is valuable to UW contact RDS • Personal Data – If you are using UW Box, then transfer the data over to a personal Box/Dropbox/Cloud account – Purchase an external hard drive and transfer data over that way
  17. 17. Key Takeaways • Figure out your storage requirements – High security? Remote access? Ease of use? Scalability? • Ask around – people are happy to help! – Local IT, Peers, Friends, Family, etc. • Rule of 3 – 2 onsite, 1 offsite – better to be safe, than sorry • Test it! – Make sure it works as advertised and do some disaster testing
  18. 18. PRESERVATION
  19. 19. Storage & Backup vs. Preservation Storage & Backup = short-term – Working copies – Expected to change Preservation = long-term – Usually the final, “fixed” version/s
  20. 20. Thinking Long-Term • The data you’ve carefully stored is only useful if it’s readable and understandable • Many factors affect this: – Media • What software did you use to create the data? Does hardware exist to access it? – Metadata • How much contextual information accompanies your data? Can you understand it? Can a stranger understand it? – Organization • Is it all jumbled together? Or have you organized it meaningfully? Do you know where your data is?
  21. 21. Thinking Long-Term • None of the concepts discussed during this workshop exist in a vacuum • Some aspects of preservation feel out of our control, like too much work • The truth? It is confusing to plan ahead for our data in a landscape of quickly changing services… • … but it’s worth it.
  22. 22. Time to Ponder • Can you still access your data from… – 20 years ago? – 10 years ago? – 5 years ago? – 1 year ago? Let’s talk about the data you’ve kept and lost.
  23. 23. Unreadable Data CULPRITS • Obsolete media • Obsolete software & file formats • Obsolete hardware CC image by Flickr user wlef70
  24. 24. Unreadable Data: Solutions Now - Start researching. (Google!) Odds are someone else has faced the same issue. - Digital forensics tools such as BitCurator can provide guidance: http://www.bitcurator.net/ - Don’t assume your data is gone for good. - Contact me to brainstorm.
  25. 25. Unreadable Data: Solutions Moving forward • Today’s popular software can become obsolete through business deals, new versions, or a gradual decline in user base. (Consider WordPerfect.) • Anticipate average lifespan of media to be 3-5 years. Migrate your files every few years, if not more frequently! • Some file formats are less susceptible to obsolescence than others – Open, non-proprietary formats (pick TXT over DOCX, CSV over XSLX, TIF over JPG) – Wide adoption – History of backward compatibility – Metadata support in open format (XML)
  26. 26. Lost Data Now • Do a data inventory. List all the places where your data lives (both physical and digital) • Plan for consolidating – follow the rule of 3, not the rule of 17 Moving forward • Too many copies can be a headache: hard to keep track of versions and know what is where. It makes sense to start a data inventory to track your data, especially at the beginning of a big project with many people and moving parts.
  27. 27. Decontextualized Data Coded SPSS survey responses (Useless without the original questionnaires)
  28. 28. Decontextualized Data: Solutions Now • Write contextual information in the form of a readme file and/or scan written notes. • Publish as additional bitstream to your datasets. • Accept that some old data will never have necessary contextual information. Is it worth it to preserve it? Moving forward • Take the time to create metadata. • At the very least, create a readme file. (Good example located here: http://hdl.handle.net/2022/17155)
  29. 29. Repositories Disciplinary repositories provide a good home for data, often with the requirement that you share it openly. DataONE: https://www.dataone.org/ Dryad: http://datadryad.org/ Knowledge Network for Biocomplexity: https://knb.ecoinformatics.org/
  30. 30. Databib & re3data Plan to merge their two projects into one service by the end of 2015.
  31. 31. Institutional Help with Preservation • IR not yet up to task of managing data… but that’s in the works. • UW Libraries is a member of the Digital Preservation Network • Several distributed, “dark archive” preservation systems being explored • And of course, RDS can help!
  32. 32. Final Thoughts • Preservation = thinking about how your data organization, metadata, and storage impacts your ability to access your data years from now. • Prioritize your most important research. You might not be able to preserve everything. • It takes active researcher participation. • Any plan is better than no plan at all. Start today. Ask for help.
  33. 33. Contact Us • Research Data Services (RDS) – http://researchdata.wisc.edu/help/about-us/ • DoIT Storage and Backup – cci@cio.wisc.edu
  34. 34. Questions?

×