This document summarizes a seminar on data management for undergraduate researchers. It discusses what data is, why it needs to be managed, and key aspects of effective data management including data organization, metadata, storage and archiving. Specific topics covered include creating data management plans, file naming conventions, structuring folders, describing data through codebooks and documentation, backup strategies, and long-term archival options. The goal is to help researchers organize and document their data so it can be understood and preserved over time.
A presentation on research data management presented at the Utah Library Association conference in May 2015. Main topics included federal mandates, data repositories, metadata, and file naming conventions. Presenters: Rebekah Cummings, Elizabeth Smart, Becky Thoms, and Brit Faggerheim.
This is the PowerPoint for my "Data Management for Undergraduate Researchers" workshop for the Office of Undergraduate Research Seminar and Workshop Series. Major topics include motivations behind good data management, file naming, version control, metadata, storage, and archiving.
Who owns the data? Intellectual property considerations for academic research...Rebekah Cummings
Intellectual property (IP) is often complicated but is even more so as it pertains to data, as “facts” are not eligible for copyright protection under United States copyright law. The IP issues surrounding data in academic research environments are often exacerbated by the fact that data ownership has rarely been discussed in university environments prior to NSF’s data management plan requirement in 2011. Researchers retained custody over their datasets and other stakeholders – namely universities and funding agencies – rarely contested ownership. Now, as datasets are increasingly seen as valuable outputs of research alongside publications, questions of data ownership are coming to the fore. This presentation will frame the complex issues surrounding data ownership in an academic research setting and will discuss strategies for educating and advising your researchers on intellectual property issues related to research data.
A presentation on research data management presented at the Utah Library Association conference in May 2015. Main topics included federal mandates, data repositories, metadata, and file naming conventions. Presenters: Rebekah Cummings, Elizabeth Smart, Becky Thoms, and Brit Faggerheim.
This is the PowerPoint for my "Data Management for Undergraduate Researchers" workshop for the Office of Undergraduate Research Seminar and Workshop Series. Major topics include motivations behind good data management, file naming, version control, metadata, storage, and archiving.
Who owns the data? Intellectual property considerations for academic research...Rebekah Cummings
Intellectual property (IP) is often complicated but is even more so as it pertains to data, as “facts” are not eligible for copyright protection under United States copyright law. The IP issues surrounding data in academic research environments are often exacerbated by the fact that data ownership has rarely been discussed in university environments prior to NSF’s data management plan requirement in 2011. Researchers retained custody over their datasets and other stakeholders – namely universities and funding agencies – rarely contested ownership. Now, as datasets are increasingly seen as valuable outputs of research alongside publications, questions of data ownership are coming to the fore. This presentation will frame the complex issues surrounding data ownership in an academic research setting and will discuss strategies for educating and advising your researchers on intellectual property issues related to research data.
This presentation was provided by Melissa Levine of the University of Michigan during a NISO Virtual Conference on the topic of data curation, held on Wednesday, August 31, 2016
Data Citation Implementation Guidelines By Tim Clarkdatascienceiqss
This talk presents a set of detailed technical recommendations for operationalizing the Joint Declaration of Data Citation Principles (JDDCP) - the most widely agreed set of principle-based recommendations for direct scholarly data citation.
We will provide initial recommendations on identifier schemes, identifier resolution behavior, required metadata elements, and best practices for realizing programmatic machine actionability of cited data.
We hope that these recommendations along with the new NISO JATS document schema revision, developed in parallel, will help accelerate the wide adoption of data citation in scholarly literature. We believe their adoption will enable open data transparency for validation, reuse and extension of scientific results; and will significantly counteract the problem of false positives in the literature.
The liaison librarian: connecting with the qualitative research lifecycleCelia Emmelhainz
A discussion of user needs in anthropology and ways in which academic liaison librarians could support the lifecycle of qualitative research in a holistic way.
DataONE Education Module 10: Legal and Policy IssuesDataONE
Lesson 10 in a set of 10 created by DataONE on Best Practices fo Data Management. The full module can be downloaded from the DataONE.org website at: http://www.dataone.org/educaiton-modules. Released under a CC0 license, attribution and citation requested.
February 18 2015 NISO Virtual Conference
Scientific Data Management: Caring for Your Institution and its Intellectual Wealth
Network Effects: RMap Project
Sheila M. Morrissey, Senior Researcher, ITHAKA
Research Data Management in the Humanities and Social SciencesCelia Emmelhainz
This two-part presentation for librarians reviews basic concepts and concerns with research data management, and is targeted to those working with humanists and social scientists. You are free to re-use and modify with attribution.
February 18 2015 NISO Virtual Conference Scientific Data Management: Caring for Your Institution and its Intellectual Wealth
Learning to Curate Research Data
Jennifer Doty, Research Data Librarian, Emory Center for Digital Scholarship, Emory University, Robert W. Woodruff Library
Data Publishing Models by Sünje Dallmeier-Tiessendatascienceiqss
Data Publishing is becoming an integral part of scholarly communication today. Thus, it is indispensable to understand how data publishing works across disciplines. Are there best practices others can learn from or even data publishing standards? How do they impact interoperability in the Open Science landscape? The presentation will look at a range of examples, and the main building blocks of data publishing today. The work has been conducted as part of the RDA Data Publishing Workflows group.
An analysis and characterization of DMPs in NSF proposals from the University...Megan O'Donnell
Beginning in July 2011, the University of Illinois at Urbana-Champaign Library, working in conjunction with the campus Office of Sponsored Programs and Research Administration (OSPRA) began an analysis of Data Management Plans (DMPs) in newly submitted National Science Foundation (NSF) grant proposals. The DMP became a required element in all NSF proposals beginning on January, 18th 2011. This analysis was undertaken to provide the Illinois campus and library with detailed information on the DMPs being submitted by Illinois researchers. In particular, the analysis allows us to categorize the grant applicant’s proposed DMP data storage venues and data reuse mechanisms, and provides us with data on the use of DMP templates developed by the University of Illinois Library.
This presentation was provided by Melissa Levine of the University of Michigan during a NISO Virtual Conference on the topic of data curation, held on Wednesday, August 31, 2016
Data Citation Implementation Guidelines By Tim Clarkdatascienceiqss
This talk presents a set of detailed technical recommendations for operationalizing the Joint Declaration of Data Citation Principles (JDDCP) - the most widely agreed set of principle-based recommendations for direct scholarly data citation.
We will provide initial recommendations on identifier schemes, identifier resolution behavior, required metadata elements, and best practices for realizing programmatic machine actionability of cited data.
We hope that these recommendations along with the new NISO JATS document schema revision, developed in parallel, will help accelerate the wide adoption of data citation in scholarly literature. We believe their adoption will enable open data transparency for validation, reuse and extension of scientific results; and will significantly counteract the problem of false positives in the literature.
The liaison librarian: connecting with the qualitative research lifecycleCelia Emmelhainz
A discussion of user needs in anthropology and ways in which academic liaison librarians could support the lifecycle of qualitative research in a holistic way.
DataONE Education Module 10: Legal and Policy IssuesDataONE
Lesson 10 in a set of 10 created by DataONE on Best Practices fo Data Management. The full module can be downloaded from the DataONE.org website at: http://www.dataone.org/educaiton-modules. Released under a CC0 license, attribution and citation requested.
February 18 2015 NISO Virtual Conference
Scientific Data Management: Caring for Your Institution and its Intellectual Wealth
Network Effects: RMap Project
Sheila M. Morrissey, Senior Researcher, ITHAKA
Research Data Management in the Humanities and Social SciencesCelia Emmelhainz
This two-part presentation for librarians reviews basic concepts and concerns with research data management, and is targeted to those working with humanists and social scientists. You are free to re-use and modify with attribution.
February 18 2015 NISO Virtual Conference Scientific Data Management: Caring for Your Institution and its Intellectual Wealth
Learning to Curate Research Data
Jennifer Doty, Research Data Librarian, Emory Center for Digital Scholarship, Emory University, Robert W. Woodruff Library
Data Publishing Models by Sünje Dallmeier-Tiessendatascienceiqss
Data Publishing is becoming an integral part of scholarly communication today. Thus, it is indispensable to understand how data publishing works across disciplines. Are there best practices others can learn from or even data publishing standards? How do they impact interoperability in the Open Science landscape? The presentation will look at a range of examples, and the main building blocks of data publishing today. The work has been conducted as part of the RDA Data Publishing Workflows group.
An analysis and characterization of DMPs in NSF proposals from the University...Megan O'Donnell
Beginning in July 2011, the University of Illinois at Urbana-Champaign Library, working in conjunction with the campus Office of Sponsored Programs and Research Administration (OSPRA) began an analysis of Data Management Plans (DMPs) in newly submitted National Science Foundation (NSF) grant proposals. The DMP became a required element in all NSF proposals beginning on January, 18th 2011. This analysis was undertaken to provide the Illinois campus and library with detailed information on the DMPs being submitted by Illinois researchers. In particular, the analysis allows us to categorize the grant applicant’s proposed DMP data storage venues and data reuse mechanisms, and provides us with data on the use of DMP templates developed by the University of Illinois Library.
This session covers topics related to data archiving and sharing. This includes data formats, metadata, controlled vocabularies, preservation, archiving and repositories.
Presentation from a University of York Library workshop on research data management. The workshop provides an introduction to research data management, covering best practice for the successful organisation, storage, documentation, archiving, and sharing of research data.
It is about:
Introduction: What Is “Research Data”? and Data Lifecycle
Part 1:
Why Manage Your Data?
Formatting and organizing the data
Storage and Security of Data
Data documentation and meta data
Quality Control
Version controlling
Working with sensitive data
Controlled Vocabulary
Centralized Data Management
Part 2:
Data sharing
What are publishers & funders saying about data sharing?
Researchers’ Attitudes
Benefits of data sharing
Considerations before data sharing
Methods of Data Sharing
Shared Data Uses and Its’ Limitations
Data management plans
Brief summary
Acknowledgment , References
Feb 26 NISO Training Thursday
Crafting a Scientific Data Management Plan
About the Training
Addressing a data management plan for the first time can be an intimidating exercise. Join NISO for a hands-on workshop that will guide you through the elements of creating a data management plan, including gathering necessary information, identifying needed resources, and navigating potential pitfalls. Participants explore the important components of a data management plan and critique excerpts of sample plans provided by the instructors.
This session is meant to be a guided, step-by-step session that will follow the February 18 NISO Virtual Conference, Scientific Data Management: Caring for Your Institution and its Intellectual Wealth.
About the Instructors
Kiyomi D. Deards, MSLIS, Assistant Professor, University of Nebraska-Lincoln Libraries
Jennifer Thoegersen, Data Curation Librarian, University of Nebraska-Lincoln Libraries
Webinar for the Mountain West Digital Library on how to turn your digital collections into datasets for digital humanities research. Includes a case study of the University of Utah Marriott Library and four digital collections we made available as datasets.
“Data? I don’t have data” is a common refrain for researchers working in the arts and humanities. Yet whether or not you consider yourself a “digital humanist,” the reality is that most of us are working digitally now, and there are different techniques for managing digital research assets than physical ones. This workshop explores how scholars of all stripes can add value to their research by making the products of their work more organized, transparent, usable, and ethical. In addition to instruction in best practices for managing research assets, participants of this workshop will create a short “data management plan,” excellent practice for fulfilling the NEA, NEH, and IMLS data management plan grant requirement!
Finding, Evaluating, and Using Quality Information Rebekah Cummings
How to find, evaluate, and capture quality information. Lecture and workshop for undergraduate students. Cover fake news, media bias, strategies for evaluating websites, use of library resources, and capturing resources in Zotero.
Worth a Thousand Words: Finding, Evaluating, and Using Historical ImagesRebekah Cummings
45 minute lecture and interactive discussion on finding, evaluating, using, and citing images for historical research. Includes short discussions on copyright, fair use, Creative Commons licenses, and attribution. Presentation created for a first year information literacy college class.
45 minute lecture and interactive discussion about the purpose of newspapers, journalism ethics, fake news, bias, and the role of a reader in parsing real news from fake news. Created for a first year college information literacy class.
Level Up! Building data services at the Marriott LibraryRebekah Cummings
Research data services have become a common fixture in academic libraries, yet many libraries still struggle to develop an appropriate and in-demand mix of services to support their research community. While an elite few offer seemingly endless curatorial assistance, the majority of libraries are building basic to mid-level services such as DMP support, workshops, and consultations. This case study provides a detailed look at the University of Utah Marriott Library’s data services, the rationale behind our current service model, the results of our campus data needs assessment, and how we plan to grow our technical infrastructure into the future. In addition to an overview of our data service mix, we will look closely at one current initiative, the Entertainment, Arts, and Engineering (EAE) Thesis Preservation Project, which highlights curation challenges such as irregular and proprietary file formats, copyright restrictions, long-term preservation, and a lack of appropriate metadata standards. This presentation will highlight the Marriott Library’s data curation accomplishments to date alongside an honest assessment of ongoing challenges.
Your digital humanities are in my library! No, your library is in my digital ...Rebekah Cummings
A presentation on the intersection of libraries and digital humanities presented at the Utah Digital Humanities Symposium at Utah Valley University on February 26, 2016.
A 40 minute presentation and demo on how to use bibliographic management systems. This presentation also included extensive demonstrations in Zotero and EndNote.
Since Wikipedia launched in 2001, librarians have maintained a cautious and, at times, hostile relationship with the online, crowd-sourced encyclopedia. Librarians have largely ignored Wikipedia, citing it as an unreliable and non-authoritative resource, and steering information seekers toward traditional reference materials. While librarians waged this quiet war, Wikipedia has gained increasing dominance as an information resource, and is now the indisputable starting point for most quick research. In this presentation, attendees will learn how to wield the power of Wikipedia in their libraries and embrace Wikipedia as an information resource. Presenters will discuss how to use Wikipedia for reference and instruction, linking online resources, increasing search engine optimization, and creating linked data for the semantic web. Presenters will also discuss the great need for librarians to delve into the world of Wikipedia as researchers and contributors; including the ethics of contributing to Wikipedia. Presenters: Dustin Fife, Rebekah Cummings, Jessica Breiman
Summary report of ACRL webinar on emerging technologiesRebekah Cummings
Summary report of the ACRL webinar on emerging technologies in libraries. Reported to the University of Utah RLS Forum in May 2015 and the Marriott Library All-Staff meeting in June 2015.
Hosting Hubs Update: Services, Pricing, and HighlightsRebekah Cummings
Mountain West Digital Library (MWDL) provides a central search portal to over 800,000 digital resources from memory institutions in Utah, Nevada, Idaho, Arizona, and Hawaii. MWDL partners typically work with one of approximately 30 MWDL hosting hubs. Hubs assist partners by providing digital collections training, digitization services, and repository hosting services. Through the hubs model MWDL supports a distributed digital collections network around the Mountain West and works to expand digital library services to additional memory institutions in the region.
In this webinar, Sandra and Rebekah will provide background on the hubs model, explain the different kinds of MWDL hubs, and discuss the need to update the current model of service. Time will be allotted for questions and discussions about the needs of both hubs and partners, and for ideas about how MWDL can modify the hubs model in the future.
MWDL as a Service Hub for the Digital Public Library of America: Updates and ...Rebekah Cummings
In this presentation, Sandra and Rebekah talk about how MWDL became a Service Hub for the DPLA and what being a Service Hub entails. They will also discuss upcoming MWDL/DPLA announcements and events such as the digitization mini-contracts program and the DPLA Community Representatives program.
Welcome to the Mountain West Digital Library: Update for New PartnersRebekah Cummings
In this webinar, Sandra and Rebekah talk about how the MWDL network came together and how partners work together across the region. They will also discuss how to join the Mountain West Digital Library, what it means to be an MWDL partner, and the benefits of partnership.
A presentation to the Research and Learning Services department at the University of Utah. The 20 minute presentation included an overview of the Mountain West Digital Library and the Digital Public Library as research resources and ended with live demonstrations on how to navigate both interfaces effectively.
PowerPoint for a junior high Career Day at which I presented. There are several slides dispelling stereotypes about librarians, followed by a few slides on what librarians are and where we work. Lastly, I spoke about my job as the Assistant Director of the Mountain West Digital Library and why Google is not enough (namely, because of metadata).
Acetabularia Information For Class 9 .docxvaibhavrinwa19
Acetabularia acetabulum is a single-celled green alga that in its vegetative state is morphologically differentiated into a basal rhizoid and an axially elongated stalk, which bears whorls of branching hairs. The single diploid nucleus resides in the rhizoid.
Embracing GenAI - A Strategic ImperativePeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
Honest Reviews of Tim Han LMA Course Program.pptxtimhan337
Personal development courses are widely available today, with each one promising life-changing outcomes. Tim Han’s Life Mastery Achievers (LMA) Course has drawn a lot of interest. In addition to offering my frank assessment of Success Insider’s LMA Course, this piece examines the course’s effects via a variety of Tim Han LMA course reviews and Success Insider comments.
Model Attribute Check Company Auto PropertyCeline George
In Odoo, the multi-company feature allows you to manage multiple companies within a single Odoo database instance. Each company can have its own configurations while still sharing common resources such as products, customers, and suppliers.
Unit 8 - Information and Communication Technology (Paper I).pdfThiyagu K
This slides describes the basic concepts of ICT, basics of Email, Emerging Technology and Digital Initiatives in Education. This presentations aligns with the UGC Paper I syllabus.
Francesca Gottschalk - How can education support child empowerment.pptxEduSkills OECD
Francesca Gottschalk from the OECD’s Centre for Educational Research and Innovation presents at the Ask an Expert Webinar: How can education support child empowerment?
Biological screening of herbal drugs: Introduction and Need for
Phyto-Pharmacological Screening, New Strategies for evaluating
Natural Products, In vitro evaluation techniques for Antioxidants, Antimicrobial and Anticancer drugs. In vivo evaluation techniques
for Anti-inflammatory, Antiulcer, Anticancer, Wound healing, Antidiabetic, Hepatoprotective, Cardio protective, Diuretics and
Antifertility, Toxicity studies as per OECD guidelines
Synthetic Fiber Construction in lab .pptxPavel ( NSTU)
Synthetic fiber production is a fascinating and complex field that blends chemistry, engineering, and environmental science. By understanding these aspects, students can gain a comprehensive view of synthetic fiber production, its impact on society and the environment, and the potential for future innovations. Synthetic fibers play a crucial role in modern society, impacting various aspects of daily life, industry, and the environment. ynthetic fibers are integral to modern life, offering a range of benefits from cost-effectiveness and versatility to innovative applications and performance characteristics. While they pose environmental challenges, ongoing research and development aim to create more sustainable and eco-friendly alternatives. Understanding the importance of synthetic fibers helps in appreciating their role in the economy, industry, and daily life, while also emphasizing the need for sustainable practices and innovation.
Introduction to AI for Nonprofits with Tapp NetworkTechSoup
Dive into the world of AI! Experts Jon Hill and Tareq Monaur will guide you through AI's role in enhancing nonprofit websites and basic marketing strategies, making it easy to understand and apply.
Overview on Edible Vaccine: Pros & Cons with Mechanism
Data Management for Undergraduate Researchers (updated - 02/2016)
1. Data Management for
Undergraduate
Researchers
Office of Undergraduate Research Seminar and Workshop Series
Rebekah Cummings, Research Data Management Librarian
J. Willard Marriott Library, University of Utah
February 23, 2016
2. • Introductions
• What are data?
• Why manage data?
• Data Management Plans
• Data Organization
• Metadata
• Storage and Archiving
• Questions
4. What is data management?
The process of controlling the
information (read: data) generated
during a research project.
https://www.libraries.psu.edu/psul/pubcur/what_is_dm.html
5. What are data?
“The recorded factual material
commonly accepted in the research
community as necessary to validate
research findings.”
- U.S. OMB Circular A-110
8. Why manage data?
• Save time and efficiency
• Meet grant requirements
• Promote reproducible research
• Enable new discoveries from your data
• Make the results of publicly funded research
publicly available
10. Two bears data
management problems
1. Didn’t know where he stored the data
2. Saved one copy of the data on a USB drive
3. Data was in a format that could only be read by
outdated, proprietary software
4. No codebook to explain the variable names
5. Variable names were not descriptive
6. No contact information for the co-author Sam Lee
11. Scenario
You develop a research project during your
undergraduate experience.You write up the
results, which are accepted by a reputable
journal. People start citing your work! Three
years later someone accuses you of falsifying
your work.
Scenario adapted from MANTRA training
module
12. • Would you be able to prove you did the
work as you described in the article?
• What would you need to prove you hadn’t
falsified the data?
• What should you have done throughout
your research study to be able to prove
you did the work as described?
13. Data Management Plans
• What data are generated by your research?
• What is your plan for managing the data?
• How will your data be shared?
14. Elements of a DMP
• Types of data, including file formats
• Data description
• Data storage
• Data sharing, including confidentiality or
security restrictions
• Data archiving and responsibility
• Data management costs
19. File naming best practices
1. Be descriptive not
generic
2. Appropriate length
(about 25 chars or less)
3. Be consistent
4. Think critically about
your file names
20. File naming best practices
• Files should include only letters,
numbers, and underscores/dashes.
• No special characters
• No spaces; Use dashes, underscores, or
camel case (like-this or likeThis)
• Avoid case dependency.Assume this,
THIS, and tHiS are the same.
• Have a strategy for version control.
• Don’t overwrite file extensions
22. Version Control - Numbering
001
002
003
009
010
099
Use leading zeros for
scalability
Bonus Tip: Use ordinal numbers (v1,v2,v3) for major version
changes and decimals for minor changes (v1.1, v2.6)
1
10
2
3
9
99
23. Version Control - Dates
If using dates useYYYYMMDD
June2015 = BAD!
06-18-2015 = BAD!
20150618 = GREAT!
2015-06-18 = This is fine too
24. From a DMP…
“Each file name, for all types of data, will
contain the project acronym PUCCUK; a
reference to the file content (survey,
interview, media) and the date of an event
(such as the date of an interview).
26. Who filed better?
• July 24 2014_SoilSamples%_v6
• 20140724_NSF_SoilSamples_Cummings
• SoilSamples_FINAL
27. Structuring folders and files
• Consider all the types of files you will handle during the course
of your project.
• Develop a nested folder structure that makes sense for your
project and your team’s retrieval needs.
• Name folders clearly, without special characters (avoid
redundancy)
• Use a standard folder structure for each project or subproject
(including making folders for files not yet created)
• Create a reference document (README file) that notes the
purpose of different folder.
University of Massachusetts Medical School Library http://libraryguides.umassmed.edu/file_management
30. Research Documentation
• Grant proposals and related reports
• Applications and approvals (e.g. IRB)
• Codebooks, data dictionaries
• Consent forms
• Surveys, questionnaires, interview protocols
• Transcripts, hard copies of audio and video files
• Any software or code you used (no matter how
insignificant or buggy)
31. Three levels of
documentation
• Project level – what the study set out to do, research
questions, methods, sampling frames, instruments,
protocols, members of the research team
• File or database level – How all the files relate to
one another.A README file is a classic way of capturing
this information.
• Variable or item level – Full label explaining the
meaning of each variable.
http://datalib.edina.ac.uk/mantra/documentation_metadata_citation/
33. What goes in a codebook?
• Variable name
• Variable meaning
• Variable data types
• Precision of data
• Units
• Known issues with the data
• Relationships to other
variables
• Null values
• Anything else someone
needs to better understand
the data
34.
35. Metadata
Unstructured
Data
Structured
Data
There was a study put out by Dr. Gary Bradshaw from
the University of Nebraska Medical Center in 1982
called “ Growth of Rodent Kidney Cells in Serum
Media and the Effect of Viral Transformation On
Growth”. It concerns the cytology of kidney cells.
Title Growth of rodent
kidney cells in serum
media and the effect of
viral transformations on
growth.
Author Gary Bradshaw
Date 1982
Publisher University of Nebraska
Medical Center
Subject Kidney -- Cytology
41. Language from a DMP
“All data files will be stored on the University server that is backed
up nightly.The University's computing network is protected from
viruses by a firewall and anti-virus software. Digital recordings will
be copied to the server each day after interviews.
Signed consent forms will be stored in a locked cabinet in the
office. Interview recordings and transcripts, which may contain
personal information, will be password protected at file-level and
stored on the server.
Original versions of the files will always be kept on the server. If
copies of files are held on a laptop and edits made, their file names
will be changed.”
44. When you archive…
• Save the data in both its proprietary and non-proprietary
format (e.g. Excel and CSV; Microsoft Word and ASCII)
• Consider any restrictions on your data (copyright, patent,
privacy, etc.)
• When possible/mandated/desired, share your data online
with a persistent identifier (DOI or ARK)
• Include a data citation and state how you want to get
credit for your data
• Link your data to your publications as often as possible
45. Major takeaways
• Data management starts at the beginning of
a project
• Document your data so that someone else
could understand it
• Have more than one copy of your data
• Consider archiving options when you are
done with your project
Specifically we are going to be be talking about data management of your research data, but some of the principles will help you when thinking about the organization of any digital materials, your notes, your PowerPoints, your grocery lists….
Most of these concepts are pretty straightforward, they almost seem like common sense, but the reality is that very few people manage their data well and if you do, you will be at a big advantage.
Overview of what we will be covering in this session. Each of these could be a one hour course, but we are going to hit the highlights so to speak.
Introductions
Name
Major
Are you working on a research project?
Let’s start at the very beginning…
This definition is from the Penn State Library website.
Controlling often means planning, organizing, and sharing that data effectively. Being thoughtful and deliberate in your data practices
Really about being informed, deliberate, and in control of the lifecycle of your data.
This is the most commonly cited definition when someone wants to pin a definition on data, which is surprisingly difficult to do.
What data really is is evidence. Or as Michael Buckland puts it “alleged evidence”. It’s what you are putting forth as evidence for your research findings. “We’ve looked at all this stuff” using these methods and here are our conclusions.
Research papers often give methods and conclusions but what they don’t usually contain is the underlying data or evidence.
So what is data – EVIDENCE FOR YOUR RESEARCH
One of the characteristics of data is that it tends to be incredibly diverse.
Scientific data – observations, computational models, lab notebooks
Social sciences – results of surveys, video recordings, field notes
Humanities – text mining, newspapers, records of human history
Another attribute of data is that it tends to get messy
Most of us just don’t realize this because our messy, disorganized files are locked up in a neat little box called your computer.
Don’t believe me? How long would it take you to find a photo from five years ago on your computer? Here is a hint. If your image files start with DSC_ or IMG_ and some number following it, it will probably take you a very long time.
If most people’s digital files were analog, this is exactly what they would look like.
The main reason you should manage your data is for yourself and for your own research team.
Data management is one of those essential skills you need to get just like learning how manage citations or understand research methods.
But it can feel a bit boring like filing. But six months later when you want to locate a file, or even understand your file, your future self will thank you.
Most important reason to have good data management is for your own good and the good of your research team. If you want to be able to locate your files or understand your files in the future, good data management is crucial. Plus, unlike research methods and managing citations, this is something that even seasoned scientists are not very good at. So you will have something to offer your research team in the future even as a young scientists.
Your best collaborator is yourself six months from now, and your past self doesn’t answer emails.
https://www.youtube.com/watch?v=N2zK3sAtr-4
Hopefully by now you can all see why data management is important. Now we’re going to think a little more deeply about how we can avoid the “Two bears” situation.
Let’s look at this scenario…
Get in teams of 2-3 and discuss…
Retain the data for a period of at least five years
Put your data in a repository
Keep multiple copies
Keep excellent documentation of your data practices, methods, and workflows, sources, plus any code you may have generated.
The most important thing you can do is to have and follow and data management plan. Next we are going to move on and talk a little bit about these data management plans that funding agencies are requiring (and I am promoting as a good idea in general!!)
Your DMP should answer three main questions…
Mention that in the UK your data management plan has to show that you’ve already looked for existing data. – ESRC
Email me I would be happy to send you more examples.
We’ve talked in broad strokes about data management but now we are going to focus in one some of the more specific aspects of managing data well.
One of the simplest things that you can do is to be more consistent with file naming, version control, and folder structures.
This section has a lot to do with organizing and naming your research materials so that you can find them later and so they will open in any environment.
We’ve talked about data management at kind of a high level. What is data? Why should you manage it well?
Now we are going to talk about some of the nuts and bolts of data management. Starting with file naming. How do you currently name files? Do you have a system?
To some extent we are all guilty of bad file naming but when it comes to your research it is important to create a system that makes sense not just to you, but other people as well.
are all guilty of bad file naming but when it comes to your research it is important to create a system that makes sense not just to you, but other people as well.
Here are some examples of bad file names because they aren’t descriptive and don’t help us find the file later, and also because there is a possibility that these files will be overwritten the next time you name a file the same thing.
File names should reflect the contents of a file and enough information to uniquely identify the data file without getting way too long.
Don’t be generic in your file names
Be consistent!!!!
Your file name may include project acronym, location, investigator, date of data collection, data type, and version number. Whatever will help you or someone else uniquely identify that file in the future.
Think about what can be added and what can be omitted in your file names. If you are the only person on a project, you probably don’t need your name. If there are going to be multiple versions of a file, make sure you add a version number or a date to differentiate.
Here are some file naming best practices that will make sure your file will open in any environment.
Special characters can have special meaning in certain programming languages and operating systems and can be misinterpreted in file names.
Uppercase lettering can affect numbering. Ex: $ = beginning of a variable names in php. A backslash designates file path locations in the Windows operating system.
Spaces make things easier for humans to read but some browsers and software don’t know how to interpret spaces. Sometimes it only reads a file up to the space, which can cause problems.
There are also best practices around version control and numbering.
Version control is often achieved by using dates or a standard numbering system
#1 is the best one.
Descriptive
Not too long, not too short
#2 is the best choice here.
First example here has spaces, irregular dates that won’t line up in order, special characters
Third example may not be descriptive enough for for a secondary user. Also, beware of the “FINAL” as opposed to using a standardized numbering system.
That is how to name an individual file. What about your whole file structure?
All your research materials need to be in one folder. The top level folder should include the project title and year. If it is multiple year, include the first and last year in the title.
The substructures should have a clear and consistent naming convention that is documented in a README file.
Exercise!!
Possible solutions:
Organize by type of file (all transcripts in one folder all audio recordings in another)
Organize by person (Have a Cliff Barrett folder and a Robert Bennett folder)
Problems with file names:
Dates are not standardized
Special characters/spaces
File type in the file name which is unnecessary
Unnecessary information in file name – “found on Internet, think okay, better than mine” picture
NO consistency to file naming
Next we are going to talk about data description.
A third characteristic of data is that it often needs context in order to be understandable
If you have a spreadsheet of survey responses, you need to have the survey to understand the responses.
You also need the codebook that explains your variable names and the values that you used, how you cleaned your data. Once again, try to think how a secondary user would interpret your data.
When we say metadata we are really talking about two things: human readable documentation and machine-readable metadata
The importance of documenting your data throughout your research project cannot be overestimated.
Document your data with a certain level of reuse in mind. Replication? Verification? inspection?
First and foremost, metadata includes any surrounding documentation you may need to make sense of your data. An excel spreadsheet of survey responses is fairly useless if you haven’t kept the survey that generated those responses.
If you are working in spreadsheets, there are three levels of documentation.
http://datalib.edina.ac.uk/mantra/documentation_metadata_citation/
FRLP – Free reduced lunch program
You must make a codebook and include it in your documentation.
This is documenting at a variable level. It’s just as important that you document at a Project and file level as well.
If you want to learn more about codebooks and how to create good ones, once again I highly recommend going to the ICPSR website and looking at their Guide to Codebooks
Metadats is very, very important for other people looking to use your project.
Often called data about data.
Structured information about an object.
Mention that there are standards for creating metadata (Dublin Core) including subject specific data.
Through the course of your research your data needs to be stored securely, backed up, and maintained regularly. Once again this sounds like common sense, but you will be happy when you pay some attention to it. (e.g. when your laptop crashes or is stolen.).
I’m going to play a short video clip that has nothing to do with research data, but I think it perfectly captures the way we approach the storage aspect of data management.
https://www.youtube.com/watch?v=QyMgNZHtdk8
#1 rule of data storage – never just keep your data on one device. You are one dropped computer, one spilled glass of water, one unscrupulous thief away from losing all of your data. Every single day I go to Mom’s Café and see people leave their computers at their table while they go to the bathroom or grab a cup of coffee.
LOCKSS - There should never just be one copy of your data. Do you backup your data? Most important data management task. NO less than two, preferably three copies of research data.
How well are you covered against unexpected loss? Make sure that when disaster strikes, it isn’t a disaster
There are three options for
Personal computers and laptops – Convenient for storing your data while in use. Should not be used for storing master copies of your data.
Networked drives – Highly recommended. You can share data. Your data is stored in a single place and backed up regularly. Available to you from any place at any time. If using a department drive or Box stored securing thereby minimizing the risk of loss, theft, or authorized access. BEST!!!
External storage devices – thumb drives, flash drives, external hard drive. Cheap, easy to store and pass around. Feel better knowing it’s in your hands where you can see it. Not recommended for the long-term storage of your data.
3,2,1 – 3 copies in 2 physical locations, or more than one media.
50 GB of free storage and an additional 50 GB if you are on a sponsored project.
Free!
Secure!
When you leave you can take a copy with you or create a new account
This is an example of social science research where the data are interview recording and transcripts.
Another area of data management that you will have to consider is data archiving.
Archiving is not the same thing as storage
Archiving adds additional value to your data.
Long-term preservation
Metadata
Sharable, usually through a persistent identifier
Makes data citable
There are lots of archiving options for your data. Some people choose to put their data on their website which is an option, but not a best practice.