MANAGING YOURRESEARCH DATAKristin A. Briney25 October 2012
Goals of this Session• Why is data management important?• How do I approach data management?• What tools can I use to mana...
Why Manage Your Data?• Better organize data makes research easier• Find it and use it later• Not loose it accidentally• Me...
Data Sharing      • Data sharing and data management often go together      • Data sharing increasingly more common       ...
Approaching Data Management• Inventory   • What data do you have?   • Some of it may not look like data!• Needs  • What ne...
Inventory• Raw data           • Laboratory  • Observational      notebooks/field notes  • Experimental     • Code  • Simul...
Inventory• How much data?• How fast does it grow?• Does it change?• What format is it in?• What is the infrastructure for ...
Needs• Who is the audience for your data?  • You (and your future self)  • Coworkers and PI  • Institutional colleagues  •...
Needs• UW-Madison policy to keep research data for 7 years  • Data must be „in sufficient detail‟  • Original data preferr...
Process Planning• Plan how to meet needs with data inventory• Document plan and process• Be willing to change the plan  • ...
Process Planning - Nuts and Bolts• Documentation• Lab notebooks• File naming conventions• File formats• Software and Code•...
Documentation• Could another person find, interpret, and use your data?  • Someone in your lab?  • Someone in your field?•...
Methods• What you did  • Laboratory procedures• Limitations of what you did• Code  • Keep with the data• Codebooks• Data d...
Metadata• Data about data• General information about your dataset  • Title  • Creator  • Date  • Subject  • Description  •...
Metadata Standards• Standards help with consistency and interoperability  • Make sure that you‟re recording all necessary ...
Lab Notebooks• Lab notebooks are for metadata and methods• Challenge to keep notebook with data  • Organize/label files wi...
File Naming Conventions• Easy and effective way to help with data management• Label files consistently• Descriptive but sh...
File Formats• Choose good formats at outset  • Open, standardized  • Documented  • In wide use• Examples:  • .txt, .jpeg, ...
Software• Files can depend on software  • Emulate or migrate• Choose software that writes to open formats  • Back-compatib...
Code• Document, document, document• Remember to preserve code with data• For lots of code, consider versioning and reposit...
Storage• Short term and long term  • Different reasons and tools for each• Short term  • Hard drives  • Flash media• Long ...
Cloud Storage• Option for short-term storage• Terms of service matter   • Some systems don‟t protect you intellectual prop...
Google Drive• UW-Madison and Google have an agreement on these apps:  • Docs (Drive), Sites, Groups, Contacts• Google cann...
Backups• Automated whenever possible• Lots of Copies Keep Stuff Safe (LOCKSS)   • Local and remote• External hard drives n...
Security• HIPAA• FERPA• Human subject information• Protect from malware, etc• Get assistance. Consult with DoIT
Process Planning• Someone is responsible for managing the data  • What happens if they leave?• Documentation  • Data colle...
The Future of Data Management• Digital data allows for many exciting possibilities• Digital data will affect  • Laboratory...
Electronic Lab Notebooks• Notes and data are digital and kept together• Easier to search• Easier to organize• Want to lear...
Funding• NSF current encourages sharing  • May require sharing in the future  • Change will happen gradually• Data sharing...
Increased Data Sharing• Likely to come from funder requirements   • Public gets greater return on its investment in resear...
Data Management• Data management is important• You can easily do this in your research   • Inventory   • Needs   • Plannin...
Many Thanks To• Ryan Schriver for sharing his slides on this topic• Ariel Neff helping to organize and promote this sessio...
Resources• Research Data Services  • http://researchdata.wisc.edu/• DoIT  • http://www.doit.wisc.edu/• Your liaison librar...
Upcoming SlideShare
Loading in …5
×

Managing Your Research Data

814 views

Published on

This talk lays out how to approach data management planning and offers practical solutions to a range data management issues.

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
814
On SlideShare
0
From Embeds
0
Number of Embeds
80
Actions
Shares
0
Downloads
6
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Managing Your Research Data

  1. 1. MANAGING YOURRESEARCH DATAKristin A. Briney25 October 2012
  2. 2. Goals of this Session• Why is data management important?• How do I approach data management?• What tools can I use to manage my data?• How will data management change over my career?
  3. 3. Why Manage Your Data?• Better organize data makes research easier• Find it and use it later• Not loose it accidentally• Meet funder and university requirements• Make your advisor and coworkers happy• Avoid accusations of misconduct• Get credit for your work
  4. 4. Data Sharing • Data sharing and data management often go together • Data sharing increasingly more common • Depends greatly on field • “Publicly available data was significantly (p=0.006) associated with a 69% increase in citations, independently of journal impact factor, date of publication, and author country of origin.” • Gene expression microarray dataHeather Piwowar, et al. “Sharing detailed research data is associated with increased citation rate.” PLoS One 2010. DOI:10.1371/journal.pone.0000308
  5. 5. Approaching Data Management• Inventory • What data do you have? • Some of it may not look like data!• Needs • What needs to be kept? • What may be shared with others? • Are there any requirements?• Process planning • How to preserve important data
  6. 6. Inventory• Raw data • Laboratory • Observational notebooks/field notes • Experimental • Code • Simulation • Software and• Analyzed data Hardware• Outside data • Figures• Videos• Sound recordings• Images
  7. 7. Inventory• How much data?• How fast does it grow?• Does it change?• What format is it in?• What is the infrastructure for collection and storage?• How easy is it to share?
  8. 8. Needs• Who is the audience for your data? • You (and your future self) • Coworkers and PI • Institutional colleagues • Disciplinary colleagues• What are your obligations? • Funder and University requirements • Confidentiality • Security• How long do you need to keep the data?
  9. 9. Needs• UW-Madison policy to keep research data for 7 years • Data must be „in sufficient detail‟ • Original data preferred• UW-Madison also has a policy for physical specimens• Funder requirements • NSF requires management and encourages sharing • NIH requires sharing for grants over $500k/year• Journal requirements • Share upon request (Nature) • Deposit data in repository (Molecular Ecology) http://www.grad.wisc.edu/research/policyrp/rpac/documents/PolicyDataStewardship.pdf
  10. 10. Process Planning• Plan how to meet needs with data inventory• Document plan and process• Be willing to change the plan • Best practices are evolving• More work upfront makes everything easier later
  11. 11. Process Planning - Nuts and Bolts• Documentation• Lab notebooks• File naming conventions• File formats• Software and Code• Storage • Cloud storage• Backups• Security
  12. 12. Documentation• Could another person find, interpret, and use your data? • Someone in your lab? • Someone in your field?• Two types of documentation • Methods • Metadata
  13. 13. Methods• What you did • Laboratory procedures• Limitations of what you did• Code • Keep with the data• Codebooks• Data dictionaries• Keep surveys and questionnaires• Don‟t forget the units!
  14. 14. Metadata• Data about data• General information about your dataset • Title • Creator • Date • Subject • Description • Identifier • Rights• Specific information about your dataset
  15. 15. Metadata Standards• Standards help with consistency and interoperability • Make sure that you‟re recording all necessary information• Dublin Core (general, all-purpose)• Darwin Core (biology)• TEI (text)• Too many to name • Your subject librarian can help you pick the right standard! • Don‟t reinvent the wheel
  16. 16. Lab Notebooks• Lab notebooks are for metadata and methods• Challenge to keep notebook with data • Organize/label files with reference to notebook • Write data to disk and slip inside lab notebook when storing• If you only do one thing: create a Table of Contents • Leave room at front, one line=one page• Electronic lab notebooks are coming! • More later…
  17. 17. File Naming Conventions• Easy and effective way to help with data management• Label files consistently• Descriptive but short name• Avoid “ / : * ? „ < > [ ] & $ characters• Use underscores not spaces• Date files and do it consistently (YYYY-MM-DD)• For analyzed data, use version numbers • Save often to new version number • Label final version FINAL
  18. 18. File Formats• Choose good formats at outset • Open, standardized • Documented • In wide use• Examples: • .txt, .jpeg, .xml• For older formats: Emulate or migrate • When migrating, keep old and new versions• Better to transform now rather than later • Like floppy disks—difficult to read now
  19. 19. Software• Files can depend on software • Emulate or migrate• Choose software that writes to open formats • Back-compatibility also important• Convert files when switching to different software • Keep old and converted files• Preserve software if necessary• May also be necessary to preserve hardware
  20. 20. Code• Document, document, document• Remember to preserve code with data• For lots of code, consider versioning and repositories • Git, Subversion, CVS • GitHub, Google Code, SourceForge• Can also publish code• Useful article on best practices: • http://arxiv.org/abs/1210.0530
  21. 21. Storage• Short term and long term • Different reasons and tools for each• Short term • Hard drives • Flash media• Long term storage • DoIT‟s file and data storage • Disciplinary repository • Get a trusted storage solution• Consult with local IT person/DoIT for right solution for you
  22. 22. Cloud Storage• Option for short-term storage• Terms of service matter • Some systems don‟t protect you intellectual property• Security conscious? Try SpiderOak• Google Drive through UW http://arstechnica.com/business/2012/04/spideroak-dropbox-for-the-security-obsessive/
  23. 23. Google Drive• UW-Madison and Google have an agreement on these apps: • Docs (Drive), Sites, Groups, Contacts• Google cannot use your content stored in these apps BUT you must log into Google with your @wisc.edu • Otherwise, your intellectual property is not protected• Benefits • Trusted storage, tech help from DoIT, legal assistance from UW http://www.doit.wisc.edu/googleapps/
  24. 24. Backups• Automated whenever possible• Lots of Copies Keep Stuff Safe (LOCKSS) • Local and remote• External hard drives not recommended • Failure risk over time• Departmental servers• Bucky Backup• Consult with DoIT for right backup solution
  25. 25. Security• HIPAA• FERPA• Human subject information• Protect from malware, etc• Get assistance. Consult with DoIT
  26. 26. Process Planning• Someone is responsible for managing the data • What happens if they leave?• Documentation • Data collection methods • Data documentation • Backup procedures • Data accessibility • Preservation plans • Security issues
  27. 27. The Future of Data Management• Digital data allows for many exciting possibilities• Digital data will affect • Laboratory record keeping • Funding requirements • Types of research • Success as a researcher• But digital data can‟t be treated like a physical object • Major differences in preserving physical objects to preserving bits • Need good data management to be able to leverage digital data
  28. 28. Electronic Lab Notebooks• Notes and data are digital and kept together• Easier to search• Easier to organize• Want to learn more? Come to my session on ELNs:• Wednesday, November 14th• 12pm-1pm• Chemistry Building, Room 9341
  29. 29. Funding• NSF current encourages sharing • May require sharing in the future • Change will happen gradually• Data sharing means money goes further • Data can be used multiple times at no extra cost to funder• Will probably see a greater push for data sharing and open access publications
  30. 30. Increased Data Sharing• Likely to come from funder requirements • Public gets greater return on its investment in research• Standardized ways to share data • Data repositories (disciplinary, local, funder, etc) • Figshare • Data publication• Rewards for sharing data • Data citation (Web of Knowledge‟s new Data Citation Index) • Increased article citations • Data may eventually be considered for tenure, as for articles• New science! • Meta-analysis and data mining
  31. 31. Data Management• Data management is important• You can easily do this in your research • Inventory • Needs • Planning• Lots of little things that can help • Small steps build up• Well managed data opens up new possibilities
  32. 32. Many Thanks To• Ryan Schriver for sharing his slides on this topic• Ariel Neff helping to organize and promote this session• Steenbock Library for hosting
  33. 33. Resources• Research Data Services • http://researchdata.wisc.edu/• DoIT • http://www.doit.wisc.edu/• Your liaison librarian • http://www.library.wisc.edu/• This presentation available under a Creative Commons Attribution-NonCommercial 3.0 license

×