SlideShare a Scribd company logo
Conquering Chaos in the Age of Networked Science: 
Research Data Management* 
*Adaptation of the NECDMC First Module 
Kathryn M. Houk, MLIS 
Tufts University Hirsh Health Sciences Library 
Wednesday June 4, 2014 
Librarians: Your Partners in Research
Today’s Objectives 
 Recognize what research data is and what data 
management entails 
 Recognize why managing data is important 
 Identify common data management issues 
 Learn best practices and resources for managing these 
issues 
 Learn about how the library can help you identify data 
management resources, tools, and best practices
What is Data? 
• “Research data, unlike other types of information, is 
collected, observed, or created, for purposes of 
analysis to produce original research results” 
(University of Edinburgh). 
• Observational 
• Experimental 
• Simulation data 
• Derived or compiled data
Why Should I Manage it? 
• Transparency & Integrity 
• Compliance
Science & Personal Benefits 
• Who uses your data now? 
• Who COULD use your data? 
• Shared/Open Data 
• Scientific progress 
• Impact on your career 
• Citation counts
What if I Don’t Consider RDM? 
Data Sharing and Management Snafu in 3 Short Acts: 
A data management horror story by Karen Hanson, 
Alisa Surkis and Karen Yacobucci. 
http://www.youtube.com/watch?v=N2zK3sAtr-4
Data Management Planning vs. a DMP
Data Management Plans 
• What types of data will be created? 
• Who will own, have access to, and be responsible 
for managing these data? 
• What equipment and methods will be used to 
capture and process data? 
• Where will data be stored during and after?
Simplified Data Management Plan 
1. Types of data 
• What types of data will you be creating or capturing? (experimental measures, observational 
or qualitative, model simulation, existing) 
• How will you capture, create, and/or process the data? (Identify instruments, software, 
imaging, etc. used) 
2. Contextual Details (Metadata) Needed to Make Data Meaningful to others 
• What file formats and naming conventions will you be using? 
3. Storage, Backup and Security 
• Where and on what media will you store the data? 
• What is your backup plan for the data? 
• How will you manage data security? 
4. Provisions for Protection/Privacy 
• How are you addressing any ethical or privacy issues (IRB, anonymization of data)? 
• Who will own any copyright or intellectual property rights to the data? 
5. Policies for re-use 
• What restrictions need to be placed on re-use of your data? 
6. Policies for access and sharing 
• What is the process for gaining access to your data? 
7. Plan for archiving and preservation of access 
• What is your long-term plan for preservation and maintenance of the data?
Creating a DMP & Considering 
Long-Term DM Issues 
• Read the case study provided 
• Your group is assigned a set of questions (labeled Group 1-6) 
to answer as best you can 
• First set of questions are from one section of the simplified DMP 
• 2nd set of questions highlight an issue that arises in day-to-day or 
long-term management of research data (a more detailed level) 
• Elect a group speaker 
• Each group will discuss their answers 
• We will go over the issue associated with your section, common 
problems, and best practices
Group 1 
• DMP Section 1: Types of Data 
1. What types (e.g. images, lists of readings, text documents) of 
data are being collected for this study? 
2. What analytical methods and tools are being used in this 
study? 
3. What types of data will be generated from these analytical 
tools and methods? 
• Detailed Planning 
1. What naming conventions are being used in the lab? 
2. Is there a structure for saving files in the lab? 
3. What kind of information would you include in a naming 
convention for files? 
4. What kinds of things would you avoid in naming/labeling files?
Issue: Records Management 
• Does this sound familiar? 
• Inconsistently labeled files 
• in multiple versions… 
• inside poorly structured folders… 
• stored on multiple media… 
• in multiple locations… 
• and in various formats…
Issue: Records Management 
• Best Practices: 
• Avoid special characters in a file name. 
• Use capitals or underscores instead of periods or spaces. 
• Use 25 or fewer characters. 
• Use documented & standardized descriptive information 
about the project/experiment. 
• Use date format ISO 8601:YYYYMMDD. 
• Include a version number.
Issue: Records Management
Group 2 
• DMP Section 2: Contextual Details (Metadata) 
1. What contextual details would the researcher need to document 
to make her data meaningful to others? 
2. How would a lack of naming and labeling conventions impact later 
data access by other researchers and possibly herself? 
• Detailed Planning 
1. What general information do you think is needed for scientific data 
to make it discoverable? (ex. Think of a search screen and a 
dropdown menu of where you can search for a term: Title, Author, 
Genre, etc.) 
2. Are you aware of any metadata standards for the life or health 
sciences? 
3. Do you think all metadata has to be hand-entered or recorded? 
4. How would you ensure lab members knew to collect and record 
specific information in standard ways?
Issue: Metadata 
• How will someone make sense of your data e.g. the cells 
and values of your spreadsheet? 
• What universal or disciplinary standards could be used to 
label your data? 
• How can you describe a data set to make it 
discoverable?
Issue: Metadata 
• Biology and health-specific metadata examples
Issue: Metadata 
• Title 
• Creator 
• Identifier 
• Subject 
• Funders 
• Rights 
• Access information 
• Language 
• Dates 
• Location 
• Methodology 
• Data processing 
• Sources 
• List of file names 
• File Formats 
• File structure 
• Variable list 
• Code lists 
• Versions 
• Checksums
Issue: Metadata 
• Best Practices 
• Describe the contents of data files 
• Define the parameters and the units on the parameter 
• Explain the formats for dates, time, geographic coordinates, 
and other parameters 
• Define any coded values 
• Describe quality flags or qualifying values 
• Define missing values
Group 3 
• DMP Section 3: Data Backup, Storage, and Security 
1. Where and on what media will the data from each source be 
stored? 
2. How, how often, and where will the data be backed up? 
3. Are there any security concerns for the data and have they 
been addressed? 
• Detailed Planning 
1. How many copies of your data do you think you should have 
and where should you keep them? 
2. Is there any group on campus you think could help you with 
backup and security/access concerns? 
3. What are some good data storage and backup practices you 
know about or practice?
Issue: Backup & Security 
• How often should data be backed up? 
• How many copies of data should you have? 
• Where can you store your data? 
• How much server space can I get?
Issue: Backup & Security 
• Best Practices 
• Make 3 copies (original + external/local + external/remote) 
• Have them geographically distributed (local vs. remote) 
• Use a Hard drive (e.g. Vista backup, Mac Timeline, UNIX 
rsync) or Tape backup system 
• Cloud Storage - some examples of private sector storage 
resources include: (Amazon S3, Elephant Drive, Jungle 
Disk, Mozy, Carbonite) 
• Unencrypted is ideal for storing your data because it will 
make it most easily read by you and others in the future…but 
if you do need to encrypt your data because of human 
subjects then: 
• Keep passwords and keys on paper (2 copies), and in a PGP 
(pretty good privacy) encrypted digital file 
• Uncompressed is also ideal for storage, but if you need to 
do so to conserve space, limit compression to your 3rd 
backup copy
Group 4 
• DMP Sections 4. Data protection/privacy and 5. Policies for 
reuse of data 
1. How is the lab addressing any privacy or ethical issues? 
2. Who will own any copyright or intellectual property rights to 
the data? 
3. Are there any restrictions to the reuse of the data? 
• Detailed Planning 
1. Are there any reasons to not share or reuse data? Are these 
ethical or cultural issues? 
2. Will having public funding affect data sharing and reuse 
differently than having private funding? 
3. Who has the right to make decisions about reuse of your data?
Issue: Ownership & Retention 
• Intellectual Property Policy 
• IRB data retention policy 
• Funders’ data retention policy 
• Publishers’ data retention policy 
• Federal and State laws
Issue: Ownership & Retention 
• How long is long enough?
Issue: Ownership & Retention 
• IRB OHRP Requirements: 45 CFR 46 requires research records to be retained 
for at least 3 years after the completion of the research. 
• HIPAA Requirements: Any research that involved collecting identifiable health 
information is subject to HIPAA requirements. As a result records must be 
retained for a minimum of 6 years after each subject signed an authorization. 
• FDA Requirements 21 CFR 312.62.c Any research that involved drugs, 
devices, or biologics being tested in humans must have records retained for a 
period of 2 years following the date a marketing application is approved for the 
drug for the indication for which it is being investigated; or, if no application is 
to be filed or if the application is not approved for such indication, until 2 years 
after the investigation is discontinued and FDA is notified. 
• VA Requirements: At present records for any research that involves the VA 
must be retained indefinitely per VA federal regulatory requirements. 
• Intellectual Property Requirements - Any research data used to support a 
patent through must be retained for the life of the patent in accordance with 
Intellectual Property Policy. 
• Check with your Funder and Publisher Requirements 
• Questions of data validity: If there are questions or allegations about the validity 
of the data or appropriate conduct of the research, you must retain all of the 
original research data until such questions or allegations have been completely 
resolved.
Group 5 
• DMP Sections 6: Policies for access and sharing 
1. How will others be able to gain future access to the study 
data? 
2. How does the graduate student plan to link her datasets to her 
published article? 
• Detailed Planning 
1. Could there be a use for the graduate student’s data that was 
not used in the published article? 
2. Are the data the student collected open formats or proprietary 
(will people need specialized software to access and interpret 
the data)? 
a) How would this affect future accessibility & reuse?
Group 6 
• DMP Section 7: Plan for archiving and preservation of access 
1. What is the long-term strategy for maintaining, curating and 
archiving the data? 
2. Where will the data be stored? 
3. What contextual data (data that describes your data) or other 
related data will be included in the archive? 
• Detailed Planning 
1. What data should be included in an archive? 
2. Do you know of any data repositories that you could use for your 
data? 
3. How can you ensure that your data is discoverable and 
interpretable? 
4. How long should the data be maintained? What factors affect the 
length of time you retain your data?
Issue: Long-Term Planning 
• What will happen to my data after my project ends? 
• How can I appraise the value of my data? 
• What are my options for archiving and preserving my 
data? 
• What are my options for publishing and sharing data?
Data Formats 
• Is the file format open (i.e. open source) or closed 
(i.e proprietary)? 
• Is a particular software package required to read 
and work with the data file? If so, the software 
package, version, and operating system platform 
should be cited in the metadata 
• Do multiple files comprise the data file structure? If 
so, that should be specified in the metadata
Open vs. Proprietary Formats Used 
in Research Labs
Issue: Long-Term Planning 
• Best Practices 
• When choosing a file format, select a consistent 
format that can be read well into the future and is 
independent of changes in applications. 
• Non-proprietary: Open, documented standard, 
Unencrypted, Uncompressed, ASCII formatted 
files will be readable into the future.
Issue: Long-Term Planning 
• Librarians can help: 
• Identify file formats suitable for long-term preservation 
• Interpret your funder or publisher’s repository 
requirements 
• Find and evaluate a suitable repository for your data 
• Upload your data sets to a repository 
• Help make your data in a repository searchable and 
discoverable 
• Create a doi and persistent id 
• Choosing metadata standards for increased 
discoverability
Issue: Data Stewardship 
• Challenges 
• Team Science 
• Managing Laboratory Notebooks 
• Rotating Lab Personnel
Issue: Data Stewardship 
• Best Practices 
• Define roles and assign responsibilities for data 
management 
• Identify skills needed to perform tasks outlined in 
DMP and match to available staff 
• Develop training plans for continuity 
• Assign responsible parties and monitor results
How the Library Can Help: 
• Teach you, your lab, or 
your classes about data 
management best 
practices 
• Write a data 
management and/or 
sharing plan 
• Comply with federal, 
funder, and publisher 
data sharing policies 
• Find & submit your data 
to a repository 
• Find standards to 
describe & label your 
data & data files 
• Find a data set 
• Cite others’ data 
• Publish a data set 
• Get a doi for a data set 
• Measure the citation 
impact of your data set 
• Build a collection of 
research data that others 
can search & access 
• Archive & preserve your 
data 
• Learn about copyright & 
license issues 
surrounding your data
Find Help 
• Ask your librarian if the library can help! 
• Make it known you are interested in receiving 
assistance from the library 
• Ask your IT department for information on storage and 
security available 
• Let them help you make a backup and storage plan
Learn More 
• Data Management Principles & Education: 
• Research Data MANTRA 
• DataONE: Best Practices 
• UK Data Archives 
• MIT Data Management and Publishing Guide 
• Data Management Plans 
• Digital Curation Centre 
• DMPTool2 
• DataONE: Data Management Planning
Works Cited 
Lamar Soutter Library, University of Massachusetts Medical School. 2014. 
“New England Collaborative Data Management Curriculum: Module 1.” 
http://library.umassmed.edu/necdmc. 
DataONE. 2013. “Best Practices for Data Management.” 
http://www.dataone.org/best-practices. 
MIT Libraries. 2013. “Data Management and Publishing.” MIT 
http://libraries.mit.edu/guides/subjects/data-management/index.html. 
Office of Research Integrity. 2013. “Data Management.” United States 
Department of Health and Human Services. United States Federal 
Government. 
http://ori.hhs.gov/education/products/rcradmin/topics/data/open.shtml. 
Special thanks to Jen Ferguson, Richard Moore and Glenn Gaudette for 
permission to use their slides.

More Related Content

What's hot

Going Full Circle: Research Data Management @ University of Pretoria
Going Full Circle: Research Data Management @ University of PretoriaGoing Full Circle: Research Data Management @ University of Pretoria
Going Full Circle: Research Data Management @ University of Pretoria
Johann van Wyk
 
The habits of highly successful data:
The habits of highly successful data: The habits of highly successful data:
The habits of highly successful data:
Anita de Waard
 
Data management plans
Data management plansData management plans
Data management plans
Brad Houston
 
Data management planning - Training for trainers, part II
Data management planning - Training for trainers, part IIData management planning - Training for trainers, part II
Data management planning - Training for trainers, part II
Mari Elisa Kuusniemi
 

What's hot (19)

DataONE Education Module 03: Data Management Planning
DataONE Education Module 03: Data Management PlanningDataONE Education Module 03: Data Management Planning
DataONE Education Module 03: Data Management Planning
 
DataONE Education Module 07: Metadata
DataONE Education Module 07: MetadataDataONE Education Module 07: Metadata
DataONE Education Module 07: Metadata
 
Good (enough) research data management practices
Good (enough) research data management practicesGood (enough) research data management practices
Good (enough) research data management practices
 
Data Management Lab: Data management plan instructions
Data Management Lab: Data management plan instructionsData Management Lab: Data management plan instructions
Data Management Lab: Data management plan instructions
 
Introduction to digital curation
Introduction to digital curationIntroduction to digital curation
Introduction to digital curation
 
FSCI Data Discovery
FSCI Data DiscoveryFSCI Data Discovery
FSCI Data Discovery
 
Going Full Circle: Research Data Management @ University of Pretoria
Going Full Circle: Research Data Management @ University of PretoriaGoing Full Circle: Research Data Management @ University of Pretoria
Going Full Circle: Research Data Management @ University of Pretoria
 
Basics of Research Data Management
Basics of Research Data ManagementBasics of Research Data Management
Basics of Research Data Management
 
A basic course on Research data management: part 1 - part 4
A basic course on Research data management: part 1 - part 4A basic course on Research data management: part 1 - part 4
A basic course on Research data management: part 1 - part 4
 
Research Data Management
Research Data ManagementResearch Data Management
Research Data Management
 
ROER4D Open Data Initiative
ROER4D Open Data InitiativeROER4D Open Data Initiative
ROER4D Open Data Initiative
 
The habits of highly successful data:
The habits of highly successful data: The habits of highly successful data:
The habits of highly successful data:
 
Data Management Lab: Session 2 slides
Data Management Lab: Session 2 slidesData Management Lab: Session 2 slides
Data Management Lab: Session 2 slides
 
METRO RDM Webinar
METRO RDM WebinarMETRO RDM Webinar
METRO RDM Webinar
 
Data management plans
Data management plansData management plans
Data management plans
 
Ten habits of highly effective data
Ten habits of highly effective dataTen habits of highly effective data
Ten habits of highly effective data
 
Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...
Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...
Perspectives on the Role of Trustworthy Repository Standards in Data Journal ...
 
Data management planning - Training for trainers, part II
Data management planning - Training for trainers, part IIData management planning - Training for trainers, part II
Data management planning - Training for trainers, part II
 
EAP_IntrotoDM_20140602
EAP_IntrotoDM_20140602EAP_IntrotoDM_20140602
EAP_IntrotoDM_20140602
 

Viewers also liked

Mini spectrophotometer for measurement of diabetes through the human-breath (...
Mini spectrophotometer for measurement of diabetes through the human-breath (...Mini spectrophotometer for measurement of diabetes through the human-breath (...
Mini spectrophotometer for measurement of diabetes through the human-breath (...
Alejandro Borges
 
Baghouse Maintenance and Field Services
Baghouse Maintenance and Field ServicesBaghouse Maintenance and Field Services
Baghouse Maintenance and Field Services
Industrial Accessories
 
Deconstructing Academic Writing: A Look at Nominalization
Deconstructing Academic Writing: A Look at NominalizationDeconstructing Academic Writing: A Look at Nominalization
Deconstructing Academic Writing: A Look at Nominalization
ProofreadingServices.com
 

Viewers also liked (11)

Audience feedback
Audience feedbackAudience feedback
Audience feedback
 
Why You Should Drink Water
Why You Should Drink WaterWhy You Should Drink Water
Why You Should Drink Water
 
The Gilgit baltiestan political history and reforms muhammad qasee
The Gilgit baltiestan political history and reforms muhammad qaseeThe Gilgit baltiestan political history and reforms muhammad qasee
The Gilgit baltiestan political history and reforms muhammad qasee
 
Oxford Innovation Leaders in Innovation Fellowships
Oxford Innovation Leaders in Innovation FellowshipsOxford Innovation Leaders in Innovation Fellowships
Oxford Innovation Leaders in Innovation Fellowships
 
Mini spectrophotometer for measurement of diabetes through the human-breath (...
Mini spectrophotometer for measurement of diabetes through the human-breath (...Mini spectrophotometer for measurement of diabetes through the human-breath (...
Mini spectrophotometer for measurement of diabetes through the human-breath (...
 
La innovación como modelo de negocio en la Industria Farmacéutica
La innovación como modelo de negocio en la Industria FarmacéuticaLa innovación como modelo de negocio en la Industria Farmacéutica
La innovación como modelo de negocio en la Industria Farmacéutica
 
Transferencia de Propiedad Intelectual en el Ecosistema de la Universidad de ...
Transferencia de Propiedad Intelectual en el Ecosistema de la Universidad de ...Transferencia de Propiedad Intelectual en el Ecosistema de la Universidad de ...
Transferencia de Propiedad Intelectual en el Ecosistema de la Universidad de ...
 
Baghouse Maintenance and Field Services
Baghouse Maintenance and Field ServicesBaghouse Maintenance and Field Services
Baghouse Maintenance and Field Services
 
MIE-CHIMAL-CLAUDIA
MIE-CHIMAL-CLAUDIAMIE-CHIMAL-CLAUDIA
MIE-CHIMAL-CLAUDIA
 
Deconstructing Academic Writing: A Look at Nominalization
Deconstructing Academic Writing: A Look at NominalizationDeconstructing Academic Writing: A Look at Nominalization
Deconstructing Academic Writing: A Look at Nominalization
 
Milos
Milos Milos
Milos
 

Similar to Conquering Chaos in the Age of Networked Science: Research Data Management

Data management plans (dmp) for nsf
Data management plans (dmp) for nsfData management plans (dmp) for nsf
Data management plans (dmp) for nsf
Brad Houston
 

Similar to Conquering Chaos in the Age of Networked Science: Research Data Management (20)

Creating a Data Management Plan
Creating a Data Management PlanCreating a Data Management Plan
Creating a Data Management Plan
 
20170222 ku-librarians勉強会 #211 :海外研修報告:英国大学図書館を北から南へ巡る旅
20170222 ku-librarians勉強会 #211 :海外研修報告:英国大学図書館を北から南へ巡る旅20170222 ku-librarians勉強会 #211 :海外研修報告:英国大学図書館を北から南へ巡る旅
20170222 ku-librarians勉強会 #211 :海外研修報告:英国大学図書館を北から南へ巡る旅
 
Support Your Data, Kyoto University
Support Your Data, Kyoto UniversitySupport Your Data, Kyoto University
Support Your Data, Kyoto University
 
Introduction to Data Management Planning
Introduction to Data Management PlanningIntroduction to Data Management Planning
Introduction to Data Management Planning
 
Data and Donuts: How to write a data management plan
Data and Donuts: How to write a data management planData and Donuts: How to write a data management plan
Data and Donuts: How to write a data management plan
 
Data Management for Undergraduate Research
Data Management for Undergraduate ResearchData Management for Undergraduate Research
Data Management for Undergraduate Research
 
Datat and donuts: how to write a data management plan
Datat and donuts: how to write a data management planDatat and donuts: how to write a data management plan
Datat and donuts: how to write a data management plan
 
Data management plans (dmp) for nsf
Data management plans (dmp) for nsfData management plans (dmp) for nsf
Data management plans (dmp) for nsf
 
Responsible Conduct of Research: Data Management
Responsible Conduct of Research: Data ManagementResponsible Conduct of Research: Data Management
Responsible Conduct of Research: Data Management
 
Responsible conduct of research: Data Management
Responsible conduct of research: Data ManagementResponsible conduct of research: Data Management
Responsible conduct of research: Data Management
 
Preparing your data for sharing and publishing
Preparing your data for sharing and publishingPreparing your data for sharing and publishing
Preparing your data for sharing and publishing
 
Introduction to Data Management Planning
Introduction to Data Management PlanningIntroduction to Data Management Planning
Introduction to Data Management Planning
 
Research data management workshop april12 2016
Research data management workshop april12 2016 Research data management workshop april12 2016
Research data management workshop april12 2016
 
Research data management workshop April 2016
Research data management workshop April 2016Research data management workshop April 2016
Research data management workshop April 2016
 
Planning for Research Data Management: 26th January 2016
Planning for Research Data Management: 26th January 2016Planning for Research Data Management: 26th January 2016
Planning for Research Data Management: 26th January 2016
 
Love Your Data Locally
Love Your Data LocallyLove Your Data Locally
Love Your Data Locally
 
Planning for Research Data Management
Planning for Research Data ManagementPlanning for Research Data Management
Planning for Research Data Management
 
Data Management for Undergraduate Researchers
Data Management for Undergraduate ResearchersData Management for Undergraduate Researchers
Data Management for Undergraduate Researchers
 
Data Management Plan Checklist
Data Management Plan ChecklistData Management Plan Checklist
Data Management Plan Checklist
 
Introduction to Research Data Management for postgraduate students
Introduction to Research Data Management for postgraduate studentsIntroduction to Research Data Management for postgraduate students
Introduction to Research Data Management for postgraduate students
 

Recently uploaded

Introduction of Biology in living organisms
Introduction of Biology in living organismsIntroduction of Biology in living organisms
Introduction of Biology in living organisms
soumyapottola
 
527598851-ppc-due-to-various-govt-policies.pdf
527598851-ppc-due-to-various-govt-policies.pdf527598851-ppc-due-to-various-govt-policies.pdf
527598851-ppc-due-to-various-govt-policies.pdf
rajpreetkaur75080
 

Recently uploaded (14)

Introduction of Biology in living organisms
Introduction of Biology in living organismsIntroduction of Biology in living organisms
Introduction of Biology in living organisms
 
Hi-Tech Industry 2024-25 Prospective.pptx
Hi-Tech Industry 2024-25 Prospective.pptxHi-Tech Industry 2024-25 Prospective.pptx
Hi-Tech Industry 2024-25 Prospective.pptx
 
The Canoga Gardens Development Project. PDF
The Canoga Gardens Development Project. PDFThe Canoga Gardens Development Project. PDF
The Canoga Gardens Development Project. PDF
 
Oracle Database Administration I (1Z0-082) Exam Dumps 2024.pdf
Oracle Database Administration I (1Z0-082) Exam Dumps 2024.pdfOracle Database Administration I (1Z0-082) Exam Dumps 2024.pdf
Oracle Database Administration I (1Z0-082) Exam Dumps 2024.pdf
 
Acorn Recovery: Restore IT infra within minutes
Acorn Recovery: Restore IT infra within minutesAcorn Recovery: Restore IT infra within minutes
Acorn Recovery: Restore IT infra within minutes
 
05232024 Joint Meeting - Community Networking
05232024 Joint Meeting - Community Networking05232024 Joint Meeting - Community Networking
05232024 Joint Meeting - Community Networking
 
123445566544333222333444dxcvbcvcvharsh.pptx
123445566544333222333444dxcvbcvcvharsh.pptx123445566544333222333444dxcvbcvcvharsh.pptx
123445566544333222333444dxcvbcvcvharsh.pptx
 
527598851-ppc-due-to-various-govt-policies.pdf
527598851-ppc-due-to-various-govt-policies.pdf527598851-ppc-due-to-various-govt-policies.pdf
527598851-ppc-due-to-various-govt-policies.pdf
 
Pollinator Ambassador Earth Steward Day Presentation 2024-05-22
Pollinator Ambassador Earth Steward Day Presentation 2024-05-22Pollinator Ambassador Earth Steward Day Presentation 2024-05-22
Pollinator Ambassador Earth Steward Day Presentation 2024-05-22
 
Sharpen existing tools or get a new toolbox? Contemporary cluster initiatives...
Sharpen existing tools or get a new toolbox? Contemporary cluster initiatives...Sharpen existing tools or get a new toolbox? Contemporary cluster initiatives...
Sharpen existing tools or get a new toolbox? Contemporary cluster initiatives...
 
Getting started with Amazon Bedrock Studio and Control Tower
Getting started with Amazon Bedrock Studio and Control TowerGetting started with Amazon Bedrock Studio and Control Tower
Getting started with Amazon Bedrock Studio and Control Tower
 
0x01 - Newton's Third Law: Static vs. Dynamic Abusers
0x01 - Newton's Third Law:  Static vs. Dynamic Abusers0x01 - Newton's Third Law:  Static vs. Dynamic Abusers
0x01 - Newton's Third Law: Static vs. Dynamic Abusers
 
Eureka, I found it! - Special Libraries Association 2021 Presentation
Eureka, I found it! - Special Libraries Association 2021 PresentationEureka, I found it! - Special Libraries Association 2021 Presentation
Eureka, I found it! - Special Libraries Association 2021 Presentation
 
Writing Sample 2 -Bridging the Divide: Enhancing Public Engagement in Urban D...
Writing Sample 2 -Bridging the Divide: Enhancing Public Engagement in Urban D...Writing Sample 2 -Bridging the Divide: Enhancing Public Engagement in Urban D...
Writing Sample 2 -Bridging the Divide: Enhancing Public Engagement in Urban D...
 

Conquering Chaos in the Age of Networked Science: Research Data Management

  • 1. Conquering Chaos in the Age of Networked Science: Research Data Management* *Adaptation of the NECDMC First Module Kathryn M. Houk, MLIS Tufts University Hirsh Health Sciences Library Wednesday June 4, 2014 Librarians: Your Partners in Research
  • 2. Today’s Objectives  Recognize what research data is and what data management entails  Recognize why managing data is important  Identify common data management issues  Learn best practices and resources for managing these issues  Learn about how the library can help you identify data management resources, tools, and best practices
  • 3. What is Data? • “Research data, unlike other types of information, is collected, observed, or created, for purposes of analysis to produce original research results” (University of Edinburgh). • Observational • Experimental • Simulation data • Derived or compiled data
  • 4. Why Should I Manage it? • Transparency & Integrity • Compliance
  • 5. Science & Personal Benefits • Who uses your data now? • Who COULD use your data? • Shared/Open Data • Scientific progress • Impact on your career • Citation counts
  • 6. What if I Don’t Consider RDM? Data Sharing and Management Snafu in 3 Short Acts: A data management horror story by Karen Hanson, Alisa Surkis and Karen Yacobucci. http://www.youtube.com/watch?v=N2zK3sAtr-4
  • 8. Data Management Plans • What types of data will be created? • Who will own, have access to, and be responsible for managing these data? • What equipment and methods will be used to capture and process data? • Where will data be stored during and after?
  • 9. Simplified Data Management Plan 1. Types of data • What types of data will you be creating or capturing? (experimental measures, observational or qualitative, model simulation, existing) • How will you capture, create, and/or process the data? (Identify instruments, software, imaging, etc. used) 2. Contextual Details (Metadata) Needed to Make Data Meaningful to others • What file formats and naming conventions will you be using? 3. Storage, Backup and Security • Where and on what media will you store the data? • What is your backup plan for the data? • How will you manage data security? 4. Provisions for Protection/Privacy • How are you addressing any ethical or privacy issues (IRB, anonymization of data)? • Who will own any copyright or intellectual property rights to the data? 5. Policies for re-use • What restrictions need to be placed on re-use of your data? 6. Policies for access and sharing • What is the process for gaining access to your data? 7. Plan for archiving and preservation of access • What is your long-term plan for preservation and maintenance of the data?
  • 10. Creating a DMP & Considering Long-Term DM Issues • Read the case study provided • Your group is assigned a set of questions (labeled Group 1-6) to answer as best you can • First set of questions are from one section of the simplified DMP • 2nd set of questions highlight an issue that arises in day-to-day or long-term management of research data (a more detailed level) • Elect a group speaker • Each group will discuss their answers • We will go over the issue associated with your section, common problems, and best practices
  • 11. Group 1 • DMP Section 1: Types of Data 1. What types (e.g. images, lists of readings, text documents) of data are being collected for this study? 2. What analytical methods and tools are being used in this study? 3. What types of data will be generated from these analytical tools and methods? • Detailed Planning 1. What naming conventions are being used in the lab? 2. Is there a structure for saving files in the lab? 3. What kind of information would you include in a naming convention for files? 4. What kinds of things would you avoid in naming/labeling files?
  • 12. Issue: Records Management • Does this sound familiar? • Inconsistently labeled files • in multiple versions… • inside poorly structured folders… • stored on multiple media… • in multiple locations… • and in various formats…
  • 13.
  • 14. Issue: Records Management • Best Practices: • Avoid special characters in a file name. • Use capitals or underscores instead of periods or spaces. • Use 25 or fewer characters. • Use documented & standardized descriptive information about the project/experiment. • Use date format ISO 8601:YYYYMMDD. • Include a version number.
  • 16. Group 2 • DMP Section 2: Contextual Details (Metadata) 1. What contextual details would the researcher need to document to make her data meaningful to others? 2. How would a lack of naming and labeling conventions impact later data access by other researchers and possibly herself? • Detailed Planning 1. What general information do you think is needed for scientific data to make it discoverable? (ex. Think of a search screen and a dropdown menu of where you can search for a term: Title, Author, Genre, etc.) 2. Are you aware of any metadata standards for the life or health sciences? 3. Do you think all metadata has to be hand-entered or recorded? 4. How would you ensure lab members knew to collect and record specific information in standard ways?
  • 17. Issue: Metadata • How will someone make sense of your data e.g. the cells and values of your spreadsheet? • What universal or disciplinary standards could be used to label your data? • How can you describe a data set to make it discoverable?
  • 18.
  • 19. Issue: Metadata • Biology and health-specific metadata examples
  • 20. Issue: Metadata • Title • Creator • Identifier • Subject • Funders • Rights • Access information • Language • Dates • Location • Methodology • Data processing • Sources • List of file names • File Formats • File structure • Variable list • Code lists • Versions • Checksums
  • 21. Issue: Metadata • Best Practices • Describe the contents of data files • Define the parameters and the units on the parameter • Explain the formats for dates, time, geographic coordinates, and other parameters • Define any coded values • Describe quality flags or qualifying values • Define missing values
  • 22. Group 3 • DMP Section 3: Data Backup, Storage, and Security 1. Where and on what media will the data from each source be stored? 2. How, how often, and where will the data be backed up? 3. Are there any security concerns for the data and have they been addressed? • Detailed Planning 1. How many copies of your data do you think you should have and where should you keep them? 2. Is there any group on campus you think could help you with backup and security/access concerns? 3. What are some good data storage and backup practices you know about or practice?
  • 23. Issue: Backup & Security • How often should data be backed up? • How many copies of data should you have? • Where can you store your data? • How much server space can I get?
  • 24. Issue: Backup & Security • Best Practices • Make 3 copies (original + external/local + external/remote) • Have them geographically distributed (local vs. remote) • Use a Hard drive (e.g. Vista backup, Mac Timeline, UNIX rsync) or Tape backup system • Cloud Storage - some examples of private sector storage resources include: (Amazon S3, Elephant Drive, Jungle Disk, Mozy, Carbonite) • Unencrypted is ideal for storing your data because it will make it most easily read by you and others in the future…but if you do need to encrypt your data because of human subjects then: • Keep passwords and keys on paper (2 copies), and in a PGP (pretty good privacy) encrypted digital file • Uncompressed is also ideal for storage, but if you need to do so to conserve space, limit compression to your 3rd backup copy
  • 25. Group 4 • DMP Sections 4. Data protection/privacy and 5. Policies for reuse of data 1. How is the lab addressing any privacy or ethical issues? 2. Who will own any copyright or intellectual property rights to the data? 3. Are there any restrictions to the reuse of the data? • Detailed Planning 1. Are there any reasons to not share or reuse data? Are these ethical or cultural issues? 2. Will having public funding affect data sharing and reuse differently than having private funding? 3. Who has the right to make decisions about reuse of your data?
  • 26. Issue: Ownership & Retention • Intellectual Property Policy • IRB data retention policy • Funders’ data retention policy • Publishers’ data retention policy • Federal and State laws
  • 27. Issue: Ownership & Retention • How long is long enough?
  • 28. Issue: Ownership & Retention • IRB OHRP Requirements: 45 CFR 46 requires research records to be retained for at least 3 years after the completion of the research. • HIPAA Requirements: Any research that involved collecting identifiable health information is subject to HIPAA requirements. As a result records must be retained for a minimum of 6 years after each subject signed an authorization. • FDA Requirements 21 CFR 312.62.c Any research that involved drugs, devices, or biologics being tested in humans must have records retained for a period of 2 years following the date a marketing application is approved for the drug for the indication for which it is being investigated; or, if no application is to be filed or if the application is not approved for such indication, until 2 years after the investigation is discontinued and FDA is notified. • VA Requirements: At present records for any research that involves the VA must be retained indefinitely per VA federal regulatory requirements. • Intellectual Property Requirements - Any research data used to support a patent through must be retained for the life of the patent in accordance with Intellectual Property Policy. • Check with your Funder and Publisher Requirements • Questions of data validity: If there are questions or allegations about the validity of the data or appropriate conduct of the research, you must retain all of the original research data until such questions or allegations have been completely resolved.
  • 29. Group 5 • DMP Sections 6: Policies for access and sharing 1. How will others be able to gain future access to the study data? 2. How does the graduate student plan to link her datasets to her published article? • Detailed Planning 1. Could there be a use for the graduate student’s data that was not used in the published article? 2. Are the data the student collected open formats or proprietary (will people need specialized software to access and interpret the data)? a) How would this affect future accessibility & reuse?
  • 30. Group 6 • DMP Section 7: Plan for archiving and preservation of access 1. What is the long-term strategy for maintaining, curating and archiving the data? 2. Where will the data be stored? 3. What contextual data (data that describes your data) or other related data will be included in the archive? • Detailed Planning 1. What data should be included in an archive? 2. Do you know of any data repositories that you could use for your data? 3. How can you ensure that your data is discoverable and interpretable? 4. How long should the data be maintained? What factors affect the length of time you retain your data?
  • 31. Issue: Long-Term Planning • What will happen to my data after my project ends? • How can I appraise the value of my data? • What are my options for archiving and preserving my data? • What are my options for publishing and sharing data?
  • 32. Data Formats • Is the file format open (i.e. open source) or closed (i.e proprietary)? • Is a particular software package required to read and work with the data file? If so, the software package, version, and operating system platform should be cited in the metadata • Do multiple files comprise the data file structure? If so, that should be specified in the metadata
  • 33. Open vs. Proprietary Formats Used in Research Labs
  • 34. Issue: Long-Term Planning • Best Practices • When choosing a file format, select a consistent format that can be read well into the future and is independent of changes in applications. • Non-proprietary: Open, documented standard, Unencrypted, Uncompressed, ASCII formatted files will be readable into the future.
  • 35. Issue: Long-Term Planning • Librarians can help: • Identify file formats suitable for long-term preservation • Interpret your funder or publisher’s repository requirements • Find and evaluate a suitable repository for your data • Upload your data sets to a repository • Help make your data in a repository searchable and discoverable • Create a doi and persistent id • Choosing metadata standards for increased discoverability
  • 36. Issue: Data Stewardship • Challenges • Team Science • Managing Laboratory Notebooks • Rotating Lab Personnel
  • 37. Issue: Data Stewardship • Best Practices • Define roles and assign responsibilities for data management • Identify skills needed to perform tasks outlined in DMP and match to available staff • Develop training plans for continuity • Assign responsible parties and monitor results
  • 38. How the Library Can Help: • Teach you, your lab, or your classes about data management best practices • Write a data management and/or sharing plan • Comply with federal, funder, and publisher data sharing policies • Find & submit your data to a repository • Find standards to describe & label your data & data files • Find a data set • Cite others’ data • Publish a data set • Get a doi for a data set • Measure the citation impact of your data set • Build a collection of research data that others can search & access • Archive & preserve your data • Learn about copyright & license issues surrounding your data
  • 39. Find Help • Ask your librarian if the library can help! • Make it known you are interested in receiving assistance from the library • Ask your IT department for information on storage and security available • Let them help you make a backup and storage plan
  • 40. Learn More • Data Management Principles & Education: • Research Data MANTRA • DataONE: Best Practices • UK Data Archives • MIT Data Management and Publishing Guide • Data Management Plans • Digital Curation Centre • DMPTool2 • DataONE: Data Management Planning
  • 41. Works Cited Lamar Soutter Library, University of Massachusetts Medical School. 2014. “New England Collaborative Data Management Curriculum: Module 1.” http://library.umassmed.edu/necdmc. DataONE. 2013. “Best Practices for Data Management.” http://www.dataone.org/best-practices. MIT Libraries. 2013. “Data Management and Publishing.” MIT http://libraries.mit.edu/guides/subjects/data-management/index.html. Office of Research Integrity. 2013. “Data Management.” United States Department of Health and Human Services. United States Federal Government. http://ori.hhs.gov/education/products/rcradmin/topics/data/open.shtml. Special thanks to Jen Ferguson, Richard Moore and Glenn Gaudette for permission to use their slides.

Editor's Notes

  1. There are a number of definitions for ‘research data,’ but this is my favorite. Data covers a broad range of types of information. Can you think of any other types of data that get created during research? Documents (text, Word), spreadsheets Laboratory notebooks, field notebooks, diaries Questionnaires, transcripts, codebooks Survey responses Health indicators such as blood cell counts, vital signs Audio and video recordings Images, films Protein or genetic sequences
  2. You may be required by a funder or publisher to maintain the data that underlies your published works and findings. Managing data is a part of compliance with the University’s IRB, and your funders’ data sharing and data management policies. Funders like the NIH reserve the right to audit your lab notebooks and pre-publication data; Since 2011 the NSF has required a data management plan and the federal govt. is currently working to make publicly funded research data available to the public. The Fair Access to Science and Technology Research (FASTR) Act is a bipartisan effort aiming to make data from federally funded research more open and accessible. “The Administration is committed to ensuring that…the direct results of federally funded scientific research are made available to and useful for the public, industry, and the scientific community. Such results include peer-reviewed publications and digital data” (Holdren 2013). Expanding Public Access to the Results of Federally Funded Research, Office of Science and Technology Policy Publications, private foundations and specific funders - like the American Heart Association – may also require data management provisions.
  3. Managing data saves you time and effort, and avoids the duplication of efforts, “good RDM = good research”. You can easily find the data you need and make these available should you be asked. In addition, publishing your data can increase your citation impact and discoverability of your research & help with promotion and tenure. You don’t know how someone else may use your data in the future. Anna Gold. Cyberinfrastructure, Data, and Libraries, Part 1: A Cyberinfrastructure Primer for Librarians. D-Lib Magazine, September/October, 2007, Volume 13 Number 9/10 http://www.dlib.org/dlib/september07/gold/09gold-pt1.html. “Managing and sharing data… increases the impact and visibility of research; promotes innovation and potential new data uses; leads to new collaborations between data users and creators; maximizes transparency and accountability; enables scrutiny of research findings; encourages improvement and validation of research methods; reduces cost of duplicating data collection; and provides important resources for education and training” Increase the visibility of your research Save time Simplify your life Preserve your data Increase your research efficiency Documentation Meet grant requirements Facilitate new discoveries Support Open Access
  4. This video from NYU lays a solid groundwork for the issues we will discuss today. In it are several scenarios that highlight data management issues that were identified by the Department of Health and Human Services’ Office of Research Integrity.
  5. Data has a life that extends beyond the project where it was created. This cycle helps you to visualize the activities in order to plan for the project’s data management needs and how data may be collected, stored, described, preserved, and/or shared. While a 2-page plan for a grant application is very important, every research project will benefit from planning for managing a project’s data throughout the life of the project, including planning for how data will be produced, collected, analyzed, stored, archived or shared, etc. The DMP is just a snapshot – an executive summary – compared to a comprehensive data management policy for your lab.
  6. Many research funders require that you have a plan to manage and/or share your data. For example, in 2011 the NSF began requiring a data management 2-page supplement with all submitted grant applications. The NIH has requires a plan for projects in excess of $500,000. These are some questions that are commonly addressed in a data management plan. The NSF has laid the foundation for requiring a data management and sharing plan. You have a copy of a simplified data management plan. It is 7 sections with at least one question that should be answered per section to satisfy the requirements for a 2-page data management document.
  7. This simplified Data Management Plan (DMP) is based on the NSF recommendations for its required 2-page data management plan. If you can answer the questions in the 7 sections of this plan, then you will likely be able to write any other data management requirements. Some of the sections can be standardized based on your institutional practices – for example, at Tufts, we have standardized language in section 3 because most of our researchers use a University research drive to store and back up their data. Remember that this is an executive summary for people who do not necessarily know much about your research or the process you will use. It is meant to be high-level and a broad overview. Going into too much detail could be counter-productive as it may make the document too long and might also make it too confusing for grant reviewers.
  8. Provided is a case study from a real researcher and their project. This means that the questions you have may not be answered explicitly. The researcher may not be aware of the need or did not mention it in their interview. You are split into groups based on the sections of the simplified DMP. For example, Group 1 will answer Part 1 of the DMP based on the information in the case study, but will also have a second set of questions that dive into more detailed issues that arise in your day-to-day or long-term management of data. The second set of questions are more reflective and you can use your own experience and knowledge to answer them. The second set of questions also highlight issues that often arise when starting to think about data management holistically. When we come back together, your team speaker will tell us your questions and answers and any other insights or questions the group had. We will then talk about the issue related to your group and some Best Practices in regards to that issue before moving on to the next group. You may not be able to answer everything with confidence – that’s OK! It’s only practice to help get you thinking about these topics and discussing with your colleagues. I will be walking around the room if you need clarification. You have 20 minutes to discuss and prepare.
  9. If we think back to the video, we realize that a lot of the issues regarding data management relate to inconsistent and confusing file and folder labels, saving data in multiple locations, and not thinking about how someone might find and make sense of your data. Records management requires thinking about how you and others can both easily find and make sense of your data.
  10. This slide comes from a colleague at Northeastern University. She looked at a sample of data files produced by students collecting data a bioscience lab. As you can see, their file naming conventions do not always take into consideration how someone not involved in a project will make sense of what is in the file. After some time, these files would probably not even make sense to the person involved in creating the file!
  11. These are some best practices for creating file names. Poorly constructed file names can cause issues when transferring files from one format to another, or to another operating system. For example, a researcher recently identified that when she moved files from REDCap™ to her analysis software, the dates were reformatted. In addition, OS like Unix can have issues reading files with spaces or special characters.
  12. Here is an example from a biomedical engineering lab that shows how you can add in project information into the file name. Notice that he labels each file with an experiment that links back to the laboratory notebook, so that there could be multiple people in the lab and multiple experiments involving the same sample, but having a systematic approach to labeling and mapping files allows for the efficient retrieval and interpretation of the data.
  13. Thanks to the NSA & Ed Snowden scandal, ‘metadata’ has become a household word! Often described as data about data, metadata are simply descriptors can help you to record information and create labels to catalog, and make sense of your data. Metadata standards can be used to describe the data’s field labels, their values, elements and parameters, and they can also describe the nature of the files that are produced, such as how many bytes, the format, the software used to create the file, the version, and who created it.
  14. Here is an example of metadata collected about a data set. It states who created the file and when, the format, and descriptive information about the data and its location.
  15. Using REDCap you can upload or create a data dictionary to define the fields, elements, and parameters for your data collection. Here is another example of metadata from a dataset uploaded to the NCBI “Flybase”. It incorporates a large amount of scientific disciplinary information such as the strain, tissue, and cell line used in the sample. Most databases where you upload your data will inform you of the basic metadata they require. Many data repositories actually have experts that create more metadata after you submit in order to make it findable and interpretable.
  16. Here is a list of common metadata fields associated with a data set.
  17. IRB guidelines and IT departments can help you learn where and how to best store & backup your data
  18. Electronic data should be saved on a device that has the appropriate security safeguards such as unique identification of authorized users, password protection, encryption, automated operating system patch (bug fix), anti-virus controls, firewall configuration, and scheduled and automatic backups to protect against data loss or theft.
  19. When it comes to data ownership and data retention there are a lot of overlapping policies. IP policy can cover the ownership and retention of data related to patents, the IRB wants to ensure that documentation of human subjects’ data are retained and/or destroyed appropriately, and the funders and publishers want you to retain data to defend the integrity of your findings, and then there are federal guidelines like HIPAA.
  20. “How long should I retain data?” is not a clear and cut data management question. Last June, for example, the Journal of Clinical Investigation retracted a published article after 6 years because one of its data tables was duplicated. The publisher contacted the researchers to have them update the data, but they could not locate the original data files after six years, so the journal was forced to issue a retraction. This case highlights how difficult it is to know for how long to keep data. This article was peer reviewed and cited over 55 times but it took six years for the representation of its data to be called into questioned. Thus, thinking about ways to digitize documents and store and preserve electronic files of data in a self-archived, disciplinary, or local data repository is important, and one of the many tasks the library can help you with. RetractionWatch.com has three different categories dealing with retractions due to various data misconduct. Fabrication of data, duplication of data, manipulation of figures/images. There’s also the issue of non-reproducible results. All of these issues can be avoided by honest researchers by good data management and preservation practices.
  21. As you can see, data retention is very situation dependent. A sponsor may require you to retain your research related documents. Prior to agreeing to a contract that specifies how long records will be maintained, you should ensure that you will receive adequate funding to pay for their storage and preservation. The library can help provide guidance on long-term preservation. Again, it would be prudent to check with the sponsor, IRB, and Office of Research before destroying any records…
  22. After a project you may want to consider appraising, and publishing or depositing your data in a repository. There are a variety of factors that impact your ability to share data with outside parties. According to the OHRP, you should contact the IRB prior to proceeding with a release of human subject data unless (a) your subjects signed an IRB approved consent document with HIPAA compliant authorization language that clearly details what information will be collected, used, and disclosed and (b) the outside party is specified in the document. Archiving & Preserving versus Storage - there’s a big difference! Digital data degrades if it is not properly taken care of. Depositing data in a repository for preservation and open access ensures that data will be properly cared for throughout the rest of it’s life – however long you determine that to be. If it is forever, then the repository will migrate the data onto the newest, most abundant storage media and convert it into a format that can be interpreted by computers in the future. (Think of the 3.5in floppy, the zip drive, the cassette, etc.)
  23. One of the greatest challenges for preservation is thinking ahead about the formats of your data. Type of data is what it is – an image, a survey, etc. Format is how it is encoded by a computer – jpg, doc, txt, etc. Some formats are produced by a specific software that is owned by a particular company. If that software becomes obsolete, so does the ability to read the file formatting and the information contained in that digital object.
  24. This graphic was created by a colleague that observed the number of instruments in just one biomedical lab relying on proprietary software. This means that to be able to open and view this file, someone would need to know the software that created it, and be able to access that software. Thus converting your files to open source and sustainable formats and standards are essential for long-term sharing, preservation and access.
  25. One of the greatest challenges in managing data is the distributed nature of modern research. With so many responsibilities, it is easy to not prioritize data management. By assigning data management tasks, you will increase the efficiency of your research. Laboratory notebooks, paper and electronic, may be audited by the funder, such as NIH. Managing and preserving these notebooks require a plan. In many labs personnel are changing constantly. There must be a plan to bridge the data management knowledge of new and outgoing students, post-docs, and staff.
  26. Unless the distribution of responsibility is clear, misunderstandings can result and compliance jeopardized. We hear a lot from students that they have had to learn DM on the go and may have little to no formal training on how to manage a specific project’s data, so do not be afraid to ask for clarification, and for documenting and formalizing DM roles & responsibilities. This is an important aspect of a DM plan.