MANAGING YOURRESEARCH DATAKristin A. Briney25 October 2012
Goals of this Session• Why is data management important?• How do I approach data management?• What tools can I use to manage my data?• How will data management change over my career?
Why Manage Your Data?• Better organize data makes research easier• Find it and use it later• Not loose it accidentally• Meet funder and university requirements• Make your advisor and coworkers happy• Avoid accusations of misconduct• Get credit for your work
Data Sharing • Data sharing and data management often go together • Data sharing increasingly more common • Depends greatly on field • “Publicly available data was significantly (p=0.006) associated with a 69% increase in citations, independently of journal impact factor, date of publication, and author country of origin.” • Gene expression microarray dataHeather Piwowar, et al. “Sharing detailed research data is associated with increased citation rate.” PLoS One 2010. DOI:10.1371/journal.pone.0000308
Approaching Data Management• Inventory • What data do you have? • Some of it may not look like data!• Needs • What needs to be kept? • What may be shared with others? • Are there any requirements?• Process planning • How to preserve important data
Inventory• Raw data • Laboratory • Observational notebooks/field notes • Experimental • Code • Simulation • Software and• Analyzed data Hardware• Outside data • Figures• Videos• Sound recordings• Images
Inventory• How much data?• How fast does it grow?• Does it change?• What format is it in?• What is the infrastructure for collection and storage?• How easy is it to share?
Needs• Who is the audience for your data? • You (and your future self) • Coworkers and PI • Institutional colleagues • Disciplinary colleagues• What are your obligations? • Funder and University requirements • Confidentiality • Security• How long do you need to keep the data?
Needs• UW-Madison policy to keep research data for 7 years • Data must be „in sufficient detail‟ • Original data preferred• UW-Madison also has a policy for physical specimens• Funder requirements • NSF requires management and encourages sharing • NIH requires sharing for grants over $500k/year• Journal requirements • Share upon request (Nature) • Deposit data in repository (Molecular Ecology) http://www.grad.wisc.edu/research/policyrp/rpac/documents/PolicyDataStewardship.pdf
Process Planning• Plan how to meet needs with data inventory• Document plan and process• Be willing to change the plan • Best practices are evolving• More work upfront makes everything easier later
Process Planning - Nuts and Bolts• Documentation• Lab notebooks• File naming conventions• File formats• Software and Code• Storage • Cloud storage• Backups• Security
Documentation• Could another person find, interpret, and use your data? • Someone in your lab? • Someone in your field?• Two types of documentation • Methods • Metadata
Methods• What you did • Laboratory procedures• Limitations of what you did• Code • Keep with the data• Codebooks• Data dictionaries• Keep surveys and questionnaires• Don‟t forget the units!
Metadata• Data about data• General information about your dataset • Title • Creator • Date • Subject • Description • Identifier • Rights• Specific information about your dataset
Metadata Standards• Standards help with consistency and interoperability • Make sure that you‟re recording all necessary information• Dublin Core (general, all-purpose)• Darwin Core (biology)• TEI (text)• Too many to name • Your subject librarian can help you pick the right standard! • Don‟t reinvent the wheel
Lab Notebooks• Lab notebooks are for metadata and methods• Challenge to keep notebook with data • Organize/label files with reference to notebook • Write data to disk and slip inside lab notebook when storing• If you only do one thing: create a Table of Contents • Leave room at front, one line=one page• Electronic lab notebooks are coming! • More later…
File Naming Conventions• Easy and effective way to help with data management• Label files consistently• Descriptive but short name• Avoid “ / : * ? „ < > [ ] & $ characters• Use underscores not spaces• Date files and do it consistently (YYYY-MM-DD)• For analyzed data, use version numbers • Save often to new version number • Label final version FINAL
File Formats• Choose good formats at outset • Open, standardized • Documented • In wide use• Examples: • .txt, .jpeg, .xml• For older formats: Emulate or migrate • When migrating, keep old and new versions• Better to transform now rather than later • Like floppy disks—difficult to read now
Software• Files can depend on software • Emulate or migrate• Choose software that writes to open formats • Back-compatibility also important• Convert files when switching to different software • Keep old and converted files• Preserve software if necessary• May also be necessary to preserve hardware
Code• Document, document, document• Remember to preserve code with data• For lots of code, consider versioning and repositories • Git, Subversion, CVS • GitHub, Google Code, SourceForge• Can also publish code• Useful article on best practices: • http://arxiv.org/abs/1210.0530
Storage• Short term and long term • Different reasons and tools for each• Short term • Hard drives • Flash media• Long term storage • DoIT‟s file and data storage • Disciplinary repository • Get a trusted storage solution• Consult with local IT person/DoIT for right solution for you
Cloud Storage• Option for short-term storage• Terms of service matter • Some systems don‟t protect you intellectual property• Security conscious? Try SpiderOak• Google Drive through UW http://arstechnica.com/business/2012/04/spideroak-dropbox-for-the-security-obsessive/
Google Drive• UW-Madison and Google have an agreement on these apps: • Docs (Drive), Sites, Groups, Contacts• Google cannot use your content stored in these apps BUT you must log into Google with your @wisc.edu • Otherwise, your intellectual property is not protected• Benefits • Trusted storage, tech help from DoIT, legal assistance from UW http://www.doit.wisc.edu/googleapps/
Backups• Automated whenever possible• Lots of Copies Keep Stuff Safe (LOCKSS) • Local and remote• External hard drives not recommended • Failure risk over time• Departmental servers• Bucky Backup• Consult with DoIT for right backup solution
Security• HIPAA• FERPA• Human subject information• Protect from malware, etc• Get assistance. Consult with DoIT
Process Planning• Someone is responsible for managing the data • What happens if they leave?• Documentation • Data collection methods • Data documentation • Backup procedures • Data accessibility • Preservation plans • Security issues
The Future of Data Management• Digital data allows for many exciting possibilities• Digital data will affect • Laboratory record keeping • Funding requirements • Types of research • Success as a researcher• But digital data can‟t be treated like a physical object • Major differences in preserving physical objects to preserving bits • Need good data management to be able to leverage digital data
Electronic Lab Notebooks• Notes and data are digital and kept together• Easier to search• Easier to organize• Want to learn more? Come to my session on ELNs:• Wednesday, November 14th• 12pm-1pm• Chemistry Building, Room 9341
Funding• NSF current encourages sharing • May require sharing in the future • Change will happen gradually• Data sharing means money goes further • Data can be used multiple times at no extra cost to funder• Will probably see a greater push for data sharing and open access publications
Increased Data Sharing• Likely to come from funder requirements • Public gets greater return on its investment in research• Standardized ways to share data • Data repositories (disciplinary, local, funder, etc) • Figshare • Data publication• Rewards for sharing data • Data citation (Web of Knowledge‟s new Data Citation Index) • Increased article citations • Data may eventually be considered for tenure, as for articles• New science! • Meta-analysis and data mining
Data Management• Data management is important• You can easily do this in your research • Inventory • Needs • Planning• Lots of little things that can help • Small steps build up• Well managed data opens up new possibilities
Many Thanks To• Ryan Schriver for sharing his slides on this topic• Ariel Neff helping to organize and promote this session• Steenbock Library for hosting
Resources• Research Data Services • http://researchdata.wisc.edu/• DoIT • http://www.doit.wisc.edu/• Your liaison librarian • http://www.library.wisc.edu/• This presentation available under a Creative Commons Attribution-NonCommercial 3.0 license
A particular slide catching your eye?
Clipping is a handy way to collect important slides you want to go back to later.