Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Research Data Management

448 views

Published on

Michigan State University campus policy, resources and best practices for research data management offered by the MSU Libraries Research Data Management Guidance service. http://www.lib.msu.edu/rdmg/

Published in: Education
  • Be the first to comment

  • Be the first to like this

Research Data Management

  1. 1. MSU Libraries Research Data Management Research Data Management Aaron Collie collie@msu.edu @aaroncollie
  2. 2. MSU Libraries Research Data Management Introductions • Please tell us your name and department • A brief description of your primary research area • What do you consider to be your research data • Experience and/or comfort level with managing research data? cc http://www.flickr.com/photos/quinnanya/
  3. 3. MSU Libraries Research Data Management • Introduction • Background • The Impetus: NSF Data Management Plan Mandate • The Effect: Policy to Practice • The Response: Changing Data Landscape • Fundamentals Practices • File Organization • Data Documentation • Reliable Backup • Data Publishing, Sharing, & Reuse • Protecting Data & Responsible Reuse • Data Lifecycle Resources Agenda
  4. 4. MSU Libraries Research Data Management Volunstrordinaries! Aaron Collie Hailey Mooney Devin Higgins Brandon Locke Ranti Junus Thomas Padilla Judy Matthews Tina Qin
  5. 5. MSU Libraries Research Data Management We teach people about RDM Librarianship Training Assessment Consultation Ad-hoc 6-12 new clients per semester 100% satisfied / 100% would use again 71% of new clients are referrals 60% requested additional services 15% through NFO, 14% through website
  6. 6. MSU Libraries Research Data Management RDM@MSU 101 • Who: You, as the designated steward • What: “the data” • When: Minimum 3 years after publ./degree • Where: Managed networked storage • Why: Legal, Ethical, Scholarly • How: With fidelity and documentation sufficient to reproduce the research
  7. 7. MSU Libraries Research Data Management http://retractionwatch.com/2014/01/07/doing-the-right-thing-authors-retract-brain-paper-with-systematic-human-error-in-coding/
  8. 8. MSU Libraries Research Data Management Jen Doty and Rob O'Reilly, “Learning to Curate @ Emory”. RDAP 2014
  9. 9. MSU Libraries Research Data Management Data Management. Isn’t that… trivial? • Not so much. Data is a primary output of research; it is very expensive to produce high quality data. Data may be collected in nanoseconds, but it takes the expert application of research protocol and design to generate data. CC-BY-SA-3.0 Rob Lavinsky CC-BY-SA-3.0 Rob
  10. 10. MSU Libraries Research Data Management Even more consequential, data is the input of a process that generates higher orders of understanding. Wisdom Knowledge Information Data Understanding is hierarchical! Russell Ackoff
  11. 11. MSU Libraries Research Data Management This is the engine of the academic industry…
  12. 12. MSU Libraries Research Data Management
  13. 13. MSU Libraries Research Data Management So, things can get a little messy.
  14. 14. MSU Libraries Research Data Management The scientific method “is often misrepresented as a fixed sequence of steps,” rather than being seen for what it truly is, “a highly variable and creative process” (AAAS 2000:18). Gauch, Hugh G. Scientific Method in Practice. New York: Cambridge University Press, 2010. Print. (Emphasis added)
  15. 15. MSU Libraries Research Data Management
  16. 16. MSU Libraries Research Data Management The Research Depth Chart Scientific Method Research Design Research Method Research Tasks MoreSpecificMoreGeneric
  17. 17. MSU Libraries Research Data Management Problem Identification Study Concept Literature Review Environmental Scan Funding & Proposal Research Design Research Methodology Research Workflow Hypothesis Formation Design Validation Research Activity Data Management Data Organization Data Storage Data Description Data Sharing Scholarly Communication Report Findings Publish Peer Review
  18. 18. MSU Libraries Research Data Management Problem Identification Study Concept Literature Review Environmental Scan Funding & Proposal Research Design Research Methodology Research Workflow Hypothesis Formation Design Validation Research Activity Data Management Data Organization Data Storage Data Description Data Sharing Scholarly Communication Report Findings Publish Peer Review
  19. 19. MSU Libraries Research Data Management • Introduction • Background • The Impetus: NSF Data Management Plan Mandate • The Effect: Policy to Practice • The Response: Changing Data Landscape • Fundamentals Practices • File Organization • Data Documentation • Reliable Backup • Data Publishing, Sharing, & Reuse • Protecting Data & Responsible Reuse • Data Lifecycle Resources Agenda
  20. 20. MSU Libraries Research Data Management Data Management • The process of planning for and implementing a system of care for your research data before, during, and after a research project in order to ensure a (re)usable resource.
  21. 21. MSU Libraries Research Data Management So why are we here? Good science! Government and Research Funder Mandates
  22. 22. MSU Libraries Research Data Management But why are we really here? • Impetus: NSF has mandated that all grant applications submitted after January 18th, 2011 must include a supplemental “Data Management Plan” • Effect: The original NSF mandate has had a domino effect, and many funders now require or state guidelines for data management of grant funded research • Response: Data management has not traditionally received a full treatment in (many) graduate and doctoral curricula; intervention is necessary
  23. 23. MSU Libraries Research Data Management Positive reinforcement…. • National Science Foundation Data Management Plan mandate (January 18, 2011) • Presidential Memorandum on Managing Government Records (August 24, 2012) – Managing Government Records Directive: All permanent electronic records in Federal agencies will be managed electronically to the fullest extent possible for eventual transfer and accessioning by NARA in an electronic format.
  24. 24. MSU Libraries Research Data Management Positive reinforcement… (cont.) • White House policy memo (February 22, 2013) – Increasing Access to the Results of Federally Funded Scientific Research: Federal agencies with more than $100M in R&D expenditures must develop plans to make the published results of federally funded research freely available to the public within one year of publication. • OSTP policy memo (March 20, 2014) – Improving the Management of and Access to Scientific Collections: directs each Federal agency that owns, maintains, or otherwise financially supports permanent scientific collections to develop a draft scientific-collections management and access policy within six months.
  25. 25. MSU Libraries Research Data Management Positive reinforcement… (cont. w/ teeth!) • AHRQ = “…all AHRQ-funded researchers will be required to include a data management plan for sharing final research data in digital format, or state why data sharing is not possible. • NASA = This plan extends NASA’s culture of open data access to all NASA-funded research.” • USDA = Phased approach beginning with DMP • More: http://www.arl.org/focus-areas/public-access- policies/federally-funded-research/2696-white-house- directive-on-public-access-to-federally-funded- research-and-data#agency-policies
  26. 26. MSU Libraries Research Data Management Funder Policies NASA “promotes the full and open sharing of all data” “requires that data…be submitted to and archived by designated national data centers.” “expects the timely release and sharing of final research data" "IMLS encourages sharing of research data." “…should describe how the project team will manage and disseminate data generated by the project”
  27. 27. MSU Libraries Research Data Management  Policies for re-use, re-distribution, and creation of derivatives  Plans for archiving data, samples, and other research outcomes, maintaining access  Types of data, samples, physical collections, software generated • Standards for data and metadata format and content • Access and sharing policies, with stipulations for privacy, confidentiality, security, intellectual property, or other rights or requirements
  28. 28. MSU Libraries Research Data Management • NSF will not evaluate any proposal missing a DMP • PI may state that project will not generate data • DMP is reviewed as part of intellectual merit or broader impacts of application, or both • Costs to implement DMP may be included in proposal’s budget • May be up to two pages long
  29. 29. MSU Libraries Research Data Management • Investigators seeking $500,000 or more in direct costs in any year should include a description of how final research data will be shared, or explain why data sharing is not possible. • The precise content of the data-sharing plan will vary, depending on the data being collected and how the investigator is planning to share the data. • More stringent data management and sharing requirements may be required in specific NIH Funding Opportunity Announcements. Principal Investigators must discuss how these requirements will be met in their Data Sharing Plans.
  30. 30. MSU Libraries Research Data Management  Roles and responsibilities  Expected Data  Period of data retention • Data formats and dissemination • Data storage and preservation of access
  31. 31. MSU Libraries Research Data Management Local Policy University Research Council Best Practices: https://rio.msu.edu/research-data Research Data: Management, Control, and Access – To assure that research data are appropriately recorded, archived for a reasonable period of time, and available for review under the appropriate circumstances. • Ownership = MSU • “Stewardship” = You • Period of Retention = 3 years • Transfer of Responsibility = Written Request
  32. 32. MSU Libraries Research Data Management Broader Response: Changing Data Landscapes • Data Management Competencies – Standards & Best Practices – Discipline Specific Discourse • Data sharing and open data – Data sets as publications – Data journals – Citations for data (e.g., used in secondary analysis) – Data as supplementary materials to traditional articles – Data repositories and archives
  33. 33. MSU Libraries Research Data Management Curation responsibilities (Carlson, The Chronicle, 2006) “Data from Big Science is … easier to handle, understand and archive. Small Science is horribly heterogeneous and far more vast. In time Small Science will generate 2-3 times more data than Big Science.” big science data small science data institution? domain? MacColl, John (2010). The Role of libraries in data curation. RLG Partnership Annual Meeting, Chicago. June 2010
  34. 34. MSU Libraries Research Data Management What’s in it for me? • Better organization = less headaches – Course management – Bibliographic management – File management – Research • Career advancement – Publish datasets and list on your CV – Data management is an “unnamed practice” – name it for yourself and your students!
  35. 35. MSU Libraries Research Data Management Data Sharing Impacts • Reinforces open scientific inquiry • Encourages diversity of analysis and opinion • Promotes new research, testing of new or alternative hypotheses and methods of analysis • Supports studies on data collection methods and measurement Cc http://www.flickr.com/photos/pinchof_10/
  36. 36. MSU Libraries Research Data Management Data Sharing Impacts • Facilitates education of new researchers • Enables exploration of topics not envisioned by initial investigators • Permits creation of new datasets by combining data from multiple sources
  37. 37. MSU Libraries Research Data Management • Introduction • Background • The Impetus: NSF Data Management Plan Mandate • The Effect: Policy to Practice • The Response: Changing Data Landscape • Fundamentals Practices • File Organization • Data Documentation • Reliable Backup • Data Publishing, Sharing, & Reuse • Protecting Data & Responsible Reuse • Data Lifecycle Resources Agenda
  38. 38. MSU Libraries Research Data Management Research Data Management Fundamentals • Documentation • File Organization • Storage & Backup • Data Publishing, Sharing, & Reuse • Protecting Data & Responsible Reuse
  39. 39. MSU Libraries Research Data Management Documentation Practices: Overview • Researchers benefit from proper documentation to decipher or reuse their datasets – even prior to thinking about sharing • Think “downstream”
  40. 40. MSU Libraries Research Data Management Documentation Practices: Overview 1. At minimum create a README file that you can use to document your project 2. Utilize standards for describing data including Metadata Standards 3. If applicable, use in-line code commentary to explain code (cc) Will Scullin
  41. 41. MSU Libraries Research Data Management Create a README file • At minimum, store documentation in readme.txt file or equivalent, with data – What data consists of – How it was collected – Restrictions to distribution or use – Other descriptive information
  42. 42. MSU Libraries Research Data Management • “Data about data” • Standardized way of describing data • Explains who, what, where, when of data creation and methods of use • Data more easily found • Data more easily compared to other data sets Use Metadata Standards
  43. 43. MSU Libraries Research Data Management Use Metadata Standards Basic project metadata: • Title • Language • File Formats • Creator • Dates • File Structure • Identifier • Location • Variable List • Subject • Methodology • Code Lists • Funders • Data Processing • Versions • Rights • Sources • Checksums • Access Information • List of File Names
  44. 44. MSU Libraries Research Data Management Use Metadata Standards • Dublin Core: Commonly-used descriptive metadata format facilitates dataset discovery across the Web. • Data Documentation Initiative (DDI): Defines metadata content, presentation, transport, and preservation for the social and behavioral sciences. • ISO 19115:2003: Describes geographic data such as maps and charts. • More examples:http://www.lib.msu.edu/about/diginfo/coll ect.jsp
  45. 45. MSU Libraries Research Data Management Use In-Line Code Commentary Example of R code commentary # Cumulative normal density pnorm(c(-1.96,0,1.96)) • If applicable, in-line code commentary helps explain code
  46. 46. MSU Libraries Research Data Management File Organization Practices: Overview 1. Design a file plan for your research project 2. Use file naming conventions that work for your project 3. Choose file formats to maximize usefulness “When I was a freshmen I named my assignments Paper Paperr Paperrr Paperrrr” -Undergrad
  47. 47. MSU Libraries Research Data Management Design a File Plan • File structure is the framework • Classification system makes it easier to locate folders/files • Benefits: – Simple organization intuitive to team members and colleagues – Reduces duplicate copies in personal drives and e-mail attachments
  48. 48. MSU Libraries Research Data Management Design a File Plan Choose a sortable directory hierarchy • Example 1: Investigator, Process, Date Collie TEI_Encoding 20110117 • Example 2: Instrument, Date, Sample Usability Survey 2012043 sample_1
  49. 49. MSU Libraries Research Data Management Design a File Plan Example documentation of Directory Hierarchy: /[Project]/[Grant Number]/[Event]/[Investigator/Date]
  50. 50. MSU Libraries Research Data Management Use File Naming Conventions – Enable better access/retrieval of files – Create logical sequences for file sorting – More easily identify what you’re searching for
  51. 51. MSU Libraries Research Data Management • Meaningful but short—255 character limit • Use alphanumeric characters – Example: abc123 • Capital letters or underscores differentiate between words • Surname first followed by initials of first name Use File Naming Conventions
  52. 52. MSU Libraries Research Data Management • Year-month-day format for dates, with or without hyphens Example 1: 2006-03-13 Example 2: 20060313 • Decide on a simple versioning method Example: file_v001 Use File Naming Conventions
  53. 53. MSU Libraries Research Data Management • To create consistent file names, specify a template such as: [investigator]_[descriptor]_[YYYYMMDD].[ex t] Use File Naming Conventions This Not This sharpeW_krillMicrograph_backscatter3_20110117.tif KrillData2011.tif This Not This borgesJ_collocation_20080414.xml Borges_Textbase.xml
  54. 54. MSU Libraries Research Data Management Choose Appropriate File Formats • Non-proprietary • Open, documented standard • Common usage by research community • Standard representation (ASCII, Unicode) • Unencrypted • Uncompressed
  55. 55. MSU Libraries Research Data Management Choose Appropriate File Formats Format Genre Optimal Standards TEXT .txt; .odt; .xml; .html AUDIO .flac; .wav, VIDEO .mp2/.mp4; .mkv IMAGE .tif; .png; .svg; .jpg DATA .sql; .csv
  56. 56. MSU Libraries Research Data Management Storage & Backup Practices 1. Avoid single points of failure 2. Ensure data redundancy & replication 3. Understand common types of storage (cc) George Ornbo Data at significant risk of loss without storage and backup plan
  57. 57. MSU Libraries Research Data Management Avoid Single Points of Failure A single point of failure occurs when it would only take one event to destroy all data on a device • Use managed networked storage when possible • Move data off of portable media • Never rely on one copy of data • Do not rely on CD or DVD copies to be readable • Be wary of software lifespans
  58. 58. MSU Libraries Research Data Management Ensure Data Redundancy • Effective data storage plan provides for 3 copies: – Primary authoritative copy – Secondary local backup – Tertiary remote backup • Geographically distribute and secure – Local vs. remote, depending on needed recovery time • Personal computer, external hard drives, departmental, or university servers may be used
  59. 59. MSU Libraries Research Data Management Ensure Data Redundancy • Cloud storage – Amazon s3 – Google – MS Azure – DuraCloud – Rackspace – Glacier Note that many enterprise cloud storage services include a charge for in/out of data transfers $$$
  60. 60. MSU Libraries Research Data Management Understand Common Types of Storage • Optical Media • Portable Flash Media • Commercial Hard Drives • Commercial NAS • Cloud Storage • Enterprise Network Storage • Trusted Archival Storage
  61. 61. MSU Libraries Research Data Management Understand Common Types of Storage • Features of storage types: • Portable data transfers • Short-term storage • Project term storage • Networked data transfer • Long-term storage • Reliable backup option
  62. 62. MSU Libraries Research Data Management Understand Common Types of StoragePortable Data Transfer Short Term Storage Project Term Storage Networked Data Transfer Long Term Storage Reliable Backup Option Optical Media ✔ ✗ ✗ ✗ ✗ ✗ Portable Flash Media ✔ ✔ ✗ ✗ ✗ ✗ Commercial Hard Drives ✔ ✔ ✔ ✗ ✗ ✗ Commercial NAS ✗ ✔ ✔ ✔ ✗ ✗ Cloud Storage ✗ ✔ ✔ ✔ ✗ ✗ Enterprise Network Storage ✗ ✔ ✔ ✔ ✔ ✔ Trusted Archival Storage ✗ ✗ ✗ ✔ ✔ ✔
  63. 63. MSU Libraries Research Data Management Understand Common Types of Storage Media Storage @ MSU Optical Media MSU Computer Store—Sells Optical Media and hardware accessories UAHC Media Storage Service—Offers physical lock-box like storage for MSU Flash Media MSU Computer Store—Sells Optical Media and hardware accessories UAHC Media Storage Service—Offers physical lock-box like storage for MSU Commercial Hard Drives MSU Computer Store—Sells Optical Media and hardware accessories. UAHC Media Storage Service—Offers physical lock-box like storage for MSU Enterprise Cloud Storage Angel—Free. Ideal for collaboration; not storage space. Phase out 2015 Desire2Learn—Free. Ideal for collaboration; not storage space. Replaces Angel GoogleApps—Free. Ideal for collaboration; not intended as storage space Enterprise Network Storage AFS Space—Free to 1GB, add’l space can be purchased w/dept. account IT Services Individual, Mid-Tier and Enterprise Storage—Fee based HPCC Home or Research—Free up to 1TB. Fee based additions available Trusted Archival Storage Disciplinary Repositories – Disciplinary repositories offer archival services for pertinent research data.
  64. 64. MSU Libraries Research Data Management Data Publishing, Sharing, Reuse 1. Time-intensive, with potentially high return on investment 2. Publish data in several data publication venues to more broadly share results of research Research datasets on par with peer-reviewed journal articles as first-class scholarly contributions
  65. 65. MSU Libraries Research Data Management Sharing & Publishing Data • Data preparation for sharing and publication is a time-intensive process • Potential positive outcomes: • Increased research impact and citations • Enable additional scientific inquiry • Opportunities for co-authorship and collaboration • Enhance your grant proposal’s competitiveness
  66. 66. MSU Libraries Research Data Management Data Publication Venues • Multiple ways to publish research data • Faculty or project website • Journal supplementary materials • Disciplinary data repository (data archive) • Varying levels of support for indexing, access controls, and long-term curation
  67. 67. MSU Libraries Research Data Management Data Publication Venues • Disciplinary Data Repository • Securely share data, ensure long-term access • High visibility • Often offer persistent citations • Availability varies across domains • Databib.org directory
  68. 68. MSU Libraries Research Data Management Data Publication Venues • Disciplinary Data Repository • Securely share data, ensure long-term access • High visibility • Often offer persistent citations • Availability varies across domains • Databib.org directory
  69. 69. MSU Libraries Research Data Management Protecting Data & Responsible Reuse 1. Consider how to protect data and intellectual property rights while encouraging reuse 2. Keep in mind ethical concerns when sharing data (cc) Will Scullin
  70. 70. MSU Libraries Research Data Management Intellectual Property • IP refers to exclusive rights of creators of works • Individual data cannot be protected by US copyright • Organization of data such as database, creative work produced by data, and research instruments used may be protected ©
  71. 71. MSU Libraries Research Data Management Intellectual Property • Principal investigator’s institution holds IP rights • Provide clearly stated license for producing derivatives, reusing, and redistributing datasets • License under Creative Commons • State if any restrictions or embargos on use • Provide example of how work should be cited to encourage proper attribution on reuse • Document any IP / copyright issues
  72. 72. MSU Libraries Research Data Management Ethics & Data Sharing • Keep in mind the following ethical concerns when sharing your data: • Privacy • Confidentiality • Security and integrity of the data • For data involving human subjects, obtain written permission or consent stating how the data may be reused
  73. 73. MSU Libraries Research Data Management Best Practices = High Impact Data • File organization ensures easier access and retrieval of data • Documentation makes datasets accessible and intelligible to users • Storage and backup safeguards data • Data publishing and sharing encourages the most widespread reuse of data • Data protection ensures responsible reuse
  74. 74. MSU Libraries Research Data Management • Introduction • Background • The Impetus: NSF Data Management Plan Mandate • The Effect: Policy to Practice • The Response: Changing Data Landscape • Fundamentals Practices • File Organization • Data Documentation • Reliable Backup • Data Publishing, Sharing, & Reuse • Protecting Data & Responsible Reuse • Data Lifecycle Resources Agenda
  75. 75. MSU Libraries Research Data Management http://www.lib.msu.edu/rdmg
  76. 76. MSU Libraries Research Data Management Contact Aaron Collie collie@msu.edu @aaroncollie http://www.lib.msu.edu/rdmg

×