This is the PowerPoint for my "Data Management for Undergraduate Researchers" workshop for the Office of Undergraduate Research Seminar and Workshop Series. Major topics include motivations behind good data management, file naming, version control, metadata, storage, and archiving.
A presentation on research data management presented at the Utah Library Association conference in May 2015. Main topics included federal mandates, data repositories, metadata, and file naming conventions. Presenters: Rebekah Cummings, Elizabeth Smart, Becky Thoms, and Brit Faggerheim.
This is the PowerPoint for my "Data Management for Undergraduate Researchers" workshop for the Office of Undergraduate Research Seminar and Workshop Series. Major topics include motivations behind good data management, file naming, version control, metadata, storage, and archiving.
A presentation on research data management presented at the Utah Library Association conference in May 2015. Main topics included federal mandates, data repositories, metadata, and file naming conventions. Presenters: Rebekah Cummings, Elizabeth Smart, Becky Thoms, and Brit Faggerheim.
It is about:
Introduction: What Is “Research Data”? and Data Lifecycle
Part 1:
Why Manage Your Data?
Formatting and organizing the data
Storage and Security of Data
Data documentation and meta data
Quality Control
Version controlling
Working with sensitive data
Controlled Vocabulary
Centralized Data Management
Part 2:
Data sharing
What are publishers & funders saying about data sharing?
Researchers’ Attitudes
Benefits of data sharing
Considerations before data sharing
Methods of Data Sharing
Shared Data Uses and Its’ Limitations
Data management plans
Brief summary
Acknowledgment , References
S. Venkataraman (DCC) talks about the basics of Research Data Management and how to apply this when creating or reviewing a Data Management Plan (DMP). He discusses data formats and metadata standards, persistent identifiers, licensing, controlled vocabularies and data repositories.
link to : dcc.ac.uk/resources
Who owns the data? Intellectual property considerations for academic research...Rebekah Cummings
Intellectual property (IP) is often complicated but is even more so as it pertains to data, as “facts” are not eligible for copyright protection under United States copyright law. The IP issues surrounding data in academic research environments are often exacerbated by the fact that data ownership has rarely been discussed in university environments prior to NSF’s data management plan requirement in 2011. Researchers retained custody over their datasets and other stakeholders – namely universities and funding agencies – rarely contested ownership. Now, as datasets are increasingly seen as valuable outputs of research alongside publications, questions of data ownership are coming to the fore. This presentation will frame the complex issues surrounding data ownership in an academic research setting and will discuss strategies for educating and advising your researchers on intellectual property issues related to research data.
This session covers topics related to data archiving and sharing. This includes data formats, metadata, controlled vocabularies, preservation, archiving and repositories.
Elaine Martin, MSLS, DA, Donna Kafel, RN, MSLS, and Andrew Creamer, MaEd, MSLS of UMass Medical School''s Lamar Soutter Library present Best Practices for Managing Data. The presentation features the importance of managing data for research projects, and tactical best practice initiatives to create a data management and sharing plan, including how to preserve label, secure, store, and preserve data. Issues, such as licensing, data dictionaries, regulations, and metadata are addressed in the presentation.
Data and Donuts: How to write a data management planC. Tobin Magle
This presentation describes best practices for how to write a data management plan for your research data. Additionally, it provides information about finding funder requirements, metadata standards, and repositories.
Are you interesting in offering data management services at your library but aren’t sure where to start? Then this class is for you! During this session, we will
• Outline the data management topics that are commonly offered in libraries
• Present strategies for how to determine what services might be most useful on your campus and create synergistic partnerships with other university entities
• Dive into how to offer support with data management plans
• Present a case study for using an institutional repository to archive and share research data
• Identify additional training opportunities and open educational resources you can use to develop robust DM services
The class will consist of a mix of presentations, hands on activities, and discussion. So come ready to participate!
This presentation was delivered at the Elsevier Library Connect Seminar on 6 October 2014 in Johannesburg, 7 October 2014 in Durban and 9 October 2014 in Cape Town and gives an overview of the potential role that librarians can play in research data management
It is about:
Introduction: What Is “Research Data”? and Data Lifecycle
Part 1:
Why Manage Your Data?
Formatting and organizing the data
Storage and Security of Data
Data documentation and meta data
Quality Control
Version controlling
Working with sensitive data
Controlled Vocabulary
Centralized Data Management
Part 2:
Data sharing
What are publishers & funders saying about data sharing?
Researchers’ Attitudes
Benefits of data sharing
Considerations before data sharing
Methods of Data Sharing
Shared Data Uses and Its’ Limitations
Data management plans
Brief summary
Acknowledgment , References
S. Venkataraman (DCC) talks about the basics of Research Data Management and how to apply this when creating or reviewing a Data Management Plan (DMP). He discusses data formats and metadata standards, persistent identifiers, licensing, controlled vocabularies and data repositories.
link to : dcc.ac.uk/resources
Who owns the data? Intellectual property considerations for academic research...Rebekah Cummings
Intellectual property (IP) is often complicated but is even more so as it pertains to data, as “facts” are not eligible for copyright protection under United States copyright law. The IP issues surrounding data in academic research environments are often exacerbated by the fact that data ownership has rarely been discussed in university environments prior to NSF’s data management plan requirement in 2011. Researchers retained custody over their datasets and other stakeholders – namely universities and funding agencies – rarely contested ownership. Now, as datasets are increasingly seen as valuable outputs of research alongside publications, questions of data ownership are coming to the fore. This presentation will frame the complex issues surrounding data ownership in an academic research setting and will discuss strategies for educating and advising your researchers on intellectual property issues related to research data.
This session covers topics related to data archiving and sharing. This includes data formats, metadata, controlled vocabularies, preservation, archiving and repositories.
Elaine Martin, MSLS, DA, Donna Kafel, RN, MSLS, and Andrew Creamer, MaEd, MSLS of UMass Medical School''s Lamar Soutter Library present Best Practices for Managing Data. The presentation features the importance of managing data for research projects, and tactical best practice initiatives to create a data management and sharing plan, including how to preserve label, secure, store, and preserve data. Issues, such as licensing, data dictionaries, regulations, and metadata are addressed in the presentation.
Data and Donuts: How to write a data management planC. Tobin Magle
This presentation describes best practices for how to write a data management plan for your research data. Additionally, it provides information about finding funder requirements, metadata standards, and repositories.
Are you interesting in offering data management services at your library but aren’t sure where to start? Then this class is for you! During this session, we will
• Outline the data management topics that are commonly offered in libraries
• Present strategies for how to determine what services might be most useful on your campus and create synergistic partnerships with other university entities
• Dive into how to offer support with data management plans
• Present a case study for using an institutional repository to archive and share research data
• Identify additional training opportunities and open educational resources you can use to develop robust DM services
The class will consist of a mix of presentations, hands on activities, and discussion. So come ready to participate!
This presentation was delivered at the Elsevier Library Connect Seminar on 6 October 2014 in Johannesburg, 7 October 2014 in Durban and 9 October 2014 in Cape Town and gives an overview of the potential role that librarians can play in research data management
Datat and donuts: how to write a data management planC. Tobin Magle
Good data management practices are becoming increasingly important in the digital age. Because we now have the technology to freely share research data and also because funding agencies want to do more with decreasing research funds, many funding agencies and journals require authors and grantees to share their research data. To provide training in this area, Tobin Magle, the Morgan Library's Cyberinfrastructure Facilitator, is putting on a series of data management workshops called "Data and Donuts". The first session of Data and Donuts will discuss the importance of data management and how to write a data management plan.
Presentation given at the Indiana University School of Medicine's Ruth Lilly Medical Library. Contains information and resources specific to Indiana University Purdue University Indianapolis (IUPUI). For full class materials, see LYD17_IUPUIWorkshop folder here: https://osf.io/r8tht/.
Responsible conduct of research: Data ManagementC. Tobin Magle
A presentation for the Food and Nutrition Science Responsible conduct of research class on data management best practices. Covers material in the context of writing a data management plan.
An introduction to Research Data Management and Data Management Planning presented at the University of the West of England on Wednesday 9th July 2014.
Similar to Data Management for Graduate Students (20)
Generating a custom Ruby SDK for your web service or Rails API using Smithyg2nightmarescribd
Have you ever wanted a Ruby client API to communicate with your web service? Smithy is a protocol-agnostic language for defining services and SDKs. Smithy Ruby is an implementation of Smithy that generates a Ruby SDK using a Smithy model. In this talk, we will explore Smithy and Smithy Ruby to learn how to generate custom feature-rich SDKs that can communicate with any web service, such as a Rails JSON API.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Data Management for Graduate Students
1. Data Management for
Graduate Students
Marriott Library Graduate Student Workshop Series
Rebekah Cummings, Research Data Management Librarian
J. Willard Marriott Library, University of Utah
September 27, 2016
2. • Introductions
• What are data?
• Why manage data?
• Data Management Plans
• Data Organization
• Metadata
• Storage and Archiving
• Questions
In the next hour…
4. What is data management?
Activities and practices that support long-
term preservation, access, and use of data
5. What are data?
“The recorded factual material
commonly accepted in the research
community as necessary to validate
research findings.”
- U.S. OMB Circular A-110
11. Two bears data
management problems
1. Didn’t know where he stored the data
2. Saved one copy of the data on a USB drive
3. Data was in a format that could only be read by
outdated, proprietary software
4. No codebook to explain the variable names
5. Variable names were not descriptive
6. No contact information for the co-author Sam Lee
12. Data Management Plans
• What data are generated by your research?
• What is your plan for managing the data?
• How will your data be shared?
13. Elements of a DMP
• Types of data, including file formats
• Data description
• Data storage
• Data sharing, including confidentiality or
security restrictions
• Data archiving and responsibility
• Data management costs
18. File naming best practices
1. Be descriptive not
generic
2. Appropriate length
(about 25 chars or less)
3. Be consistent
4. Think critically about
your file names
19. File naming best practices
• Files should include only letters,
numbers, and underscores/dashes.
• No special characters.
• No spaces; Use dashes, underscores, or
camel case (likeThis).
• Avoid case dependency.Assume this,
THIS, and tHiS are the same.
• Have a strategy for version control.
• Don’t overwrite file extensions
21. Version Control - Numbering
001
002
003
009
010
099
Use leading zeros for
scalability
Bonus Tip: Use ordinal numbers (v1,v2,v3) for major version
changes and decimals for minor changes (v1.1, v2.6)
1
10
2
3
9
99
22. Version Control - Dates
If using dates useYYYYMMDD
June2015 = BAD!
06-18-2015 = BAD!
20150618 = GREAT!
2015-06-18 = This is fine too
23. From a DMP…
“Each file name, for all types of data, will
contain the project acronym PUCCUK; a
reference to the file content (survey,
interview, media) and the date of an event
(such as the date of an interview).
25. Who filed better?
• July 24 2014_SoilSamples%_v6
• 20140724_NSF_SoilSamples_Cummings
• SoilSamples_FINAL
26. Structuring folders and files
• Consider all the types of files you will handle during the course
of your project.
• Develop a nested folder structure that makes sense for your
project and your team’s retrieval needs.
• Name folders clearly, without special characters.
• Use a standard folder structure for each project or subproject
(including making folders for files not yet created)
• Create a reference document (README file) that notes the
purpose of different folder.
University of Massachusetts Medical School Library http://libraryguides.umassmed.edu/file_management
30. Research Documentation
• Grant proposals and related reports
• Applications and approvals (e.g. IRB)
• Codebooks, data dictionaries
• Consent forms
• Surveys, questionnaires, interview protocols
• Transcripts, hard copies of audio and video files
• Any software or code you used (no matter how
insignificant or buggy)
32. What goes in a codebook?
• Variable name
• Variable meaning
• Variable data types
• Precision of data
• Units
• Known issues with the data
• Relationships to other
variables
• Null values
• Anything else someone
needs to better understand
the data
33. Metadata
Unstructured
Data
Structured
Data
There was a study put out by Dr. Gary Bradshaw from
the University of Nebraska Medical Center in 1982
called “ Growth of Rodent Kidney Cells in Serum
Media and the Effect of Viral Transformation On
Growth”. It concerns the cytology of kidney cells.
Title Growth of rodent
kidney cells in serum
media and the effect of
viral transformations on
growth.
Author Gary Bradshaw
Date 1982
Publisher University of Nebraska
Medical Center
Subject Kidney -- Cytology
34. At the very least…
• Title
• Creator
• Description
• Date
• Type
• Publisher
• Format
• Identifier (DOI)
• Rights
• Any other critical
information to understand
or cite the data.
41. Language from a DMP
“All data files will be stored on the University server that is backed
up nightly.The University's computing network is protected from
viruses by a firewall and anti-virus software. Digital recordings will
be copied to the server each day after interviews.
Signed consent forms will be stored in a locked cabinet in the
office. Interview recordings and transcripts, which may contain
personal information, will be password protected at file-level and
stored on the server.
Original versions of the files will always be kept on the server. If
copies of files are held on a laptop and edits made, their file names
will be changed.”
44. When you archive…
• Save the data in both its proprietary and non-proprietary
format (e.g. Excel and CSV; Microsoft Word and ASCII)
• Consider any restrictions on your data (copyright, patent,
privacy, etc.)
• When possible/mandated/desired, share your data online
with a persistent identifier (DOI or ARK)
• Include a data citation and state how you want to get
credit for your data
• Link your data to your publications as often as possible
45. Your data librarians
Daureen Nesdill
Research Data
Management
Librarian,
Sciences
Darell Schmick
Research
Librarian, Health
Sciences
Rebekah Cummings
Research Data
Management
Librarian, Social
Sciences &
Humanities
46. Major takeaways
• Data management starts at the beginning of
a project
• Document your data so that someone else
could understand it
• Have more than one copy of your data
• Consider archiving options when you are
done with your project
Specifically we are going to be be talking about data management of your research data, but some of the principles will help you when thinking about the organization of any digital materials, your notes, your PowerPoints, your grocery lists….
Most of these concepts are pretty straightforward, they almost seem like common sense, but the reality is that very few people manage their data well and if you do, you will be at a big advantage.
Overview of what we will be covering in this session.
Introductions
Name
Major
Are you working on a research project?
Data Management refers to activities throughout the data lifecycle. –
The activities surrounding data management include
Being a responsible reseracher.
These activities happen during the research and after the research is completed.
This is the most commonly cited definition when someone wants to pin a definition on data, which is surprisingly difficult to do.
What data really is is evidence. Or as Michael Buckland puts it “alleged evidence”. It’s what you are putting forth as evidence for your research findings. “We’ve looked at all this stuff” using these methods and here are our conclusions.
Research papers often give methods and conclusions but what they don’t usually contain is the underlying data or evidence.
So what is data – EVIDENCE FOR YOUR RESEARCH
One of the characteristics of data is that it tends to be incredibly diverse.
Scientific data – observations, computational models, lab notebooks
Social sciences – results of surveys, video recordings, field notes
Humanities – text mining, newspapers, records of human history
Each field tends to have their own practices around data collection, analysis, sharing, etc.
Another attribute of data is that it tends to get messy
Most of us just don’t realize this because our messy, disorganized files are locked up in a neat little box called your computer.
Don’t believe me? How long would it take you to find a photo from five years ago on your computer? Here is a hint. If your image files start with DSC_ or IMG_ and some number following it, it will probably take you a very long time.
If most people’s digital files were analog, this is exactly what they would look like.
Why manage data?
The main reason you should manage your data is for yourself and for your own research team.
Data management is one of those essential skills you need to get just like learning how manage citations or understand research methods.
But it can feel a bit boring like filing. But six months later when you want to locate a file, or even understand your file, your future self will thank you.
Most important reason to have good data management is for your own good and the good of your research team. If you want to be able to locate your files or understand your files in the future, good data management is crucial. Plus, unlike research methods and managing citations, this is something that even seasoned scientists are not very good at. So you will have something to offer your research team in the future even as a young scientists.
USE THE “DOING YOUR TAXES” ANALOGY – it’s easier if you’ve managing your receipts effectively throughout the year and compiling spreadsheets throughout the year, you will be in much better shape in April. Can you scramble for information at the end? Of course! But you are not maximizing your time and resources. You are likely not getting the returns you shoud and you are wasting time. Sometimes, the documentation isn’t available later. You won’t have to make guesses.
Data management is one of those essential skills you need to get just like learning how manage citations or understand research methods.
But it can feel a bit boring like filing. But six months later when you want to locate a file, or even understand your file, your future self will thank you.
Most important reason to have good data management is for your own good and the good of your research team. If you want to be able to locate your files or understand your files in the future, good data management is crucial. Plus, unlike research methods and managing citations, this is something that even seasoned scientists are not very good at. So you will have something to offer your research team in the future even as a young scientists.
NSF is now starting to look at DMPs as part of their post-award assessment checking to see if researchers did what they said they were going to do with data
https://www.youtube.com/watch?v=N2zK3sAtr-4
The most important thing you can do is to have and follow and data management plan. Next we are going to move on and talk a little bit about these data management plans that funding agencies are requiring (and I am promoting as a good idea in general!!)
Your DMP should answer three main questions…
Mention that in the UK your data management plan has to show that you’ve already looked for existing data. – ESRC
Email me I would be happy to send you more examples.
We’ve talked in broad strokes about data management but now we are going to focus in one some of the more specific aspects of managing data well.
One of the simplest things that you can do is to be more consistent with file naming, version control, and folder structures.
This section has a lot to do with organizing and naming your research materials so that you can find them later and so they will open in any environment.
We’ve talked about data management at kind of a high level. What is data? Why should you manage it well?
Now we are going to talk about some of the nuts and bolts of data management. Starting with file naming. How do you currently name files? Do you have a system?
To some extent we are all guilty of bad file naming but when it comes to your research it is important to create a system that makes sense not just to you, but other people as well.
are all guilty of bad file naming but when it comes to your research it is important to create a system that makes sense not just to you, but other people as well.
Here are some examples of bad file names because they aren’t descriptive and don’t help us find the file later, and also because there is a possibility that these files will be overwritten the next time you name a file the same thing.
File names should reflect the contents of a file and enough information to uniquely identify the data file without getting way too long.
Don’t be generic in your file names
Be consistent!!!!
Your file name may include project acronym, location, investigator, date of data collection, data type, and version number. Whatever will help you or someone else uniquely identify that file in the future.
Think about what can be added and what can be omitted in your file names. If you are the only person on a project, you probably don’t need your name. If there are going to be multiple versions of a file, make sure you add a version number or a date to differentiate.
Here are some file naming best practices that will make sure your file will open in any environment with any operating system.
Special characters can have special meaning in certain programming languages and operating systems and can be misinterpreted in file names.
Uppercase lettering can affect numbering. Ex: $ = beginning of a variable names in php. A backslash designates file path locations in the Windows operating system.
Spaces make things easier for humans to read but some browsers and software don’t know how to interpret spaces. Sometimes it only reads a file up to the space, which can cause problems.
There are also best practices around version control and numbering.
Version control is often achieved by using dates or a standard numbering system
January, June, and July are going to line up next to each other.
April and August are going to be together
December is going to come before June, etc and all your Januarys from every year are going to line up.
#1 is the best one.
Descriptive
Not too long, not too short
#2 is the best choice here.
First example here has spaces, irregular dates that won’t line up in order, special characters
Third example may not be descriptive enough for for a secondary user. Also, beware of the “FINAL” as opposed to using a standardized numbering system.
That is how to name an individual file. What about your whole file structure?
All your research materials need to be in one folder. The top level folder should include the project title and year. If it is multiple year, include the first and last year in the title.
The substructures should have a clear and consistent naming convention that is documented in a README file.
Exercise!!
Possible solutions:
Organize by type of file (all transcripts in one folder all audio recordings in another)
Organize by person (Have a Cliff Barrett folder and a Robert Bennett folder)
Problems with file names:
Dates are not standardized
Special characters/spaces
File type in the file name which is unnecessary
Unnecessary information in file name – “found on Internet, think okay, better than mine” picture
NO consistency to file naming
Next we are going to talk about data description.
A third characteristic of data is that it often needs context in order to be understandable
If you have a spreadsheet of survey responses, you need to have the survey to understand the responses.
You also need the codebook that explains your variable names and the values that you used, how you cleaned your data. Once again, try to think how a secondary user would interpret your data.
When we say metadata we are really talking about two things: human readable documentation and machine-readable metadata
The importance of documenting your data throughout your research project cannot be overestimated.
Document your data with a certain level of reuse in mind. Replication? Verification? inspection?
First and foremost, metadata includes any surrounding documentation you may need to make sense of your data. An excel spreadsheet of survey responses is fairly useless if you haven’t kept the survey that generated those responses.
If you are working with variables, you must make a codebook and include it in your documentation.
Metadata is very important for other people looking to use your project.
Human readable vs. machine readable
Most researchers are very protective of their data. You work hard to collect it and you have a huge intellectual investment in it. Also, since most people have never been asked to hand over or even share their data, the assumption is often that the researcher is the one who owns the data. The truth, however, is more complicated than that.
If you are an employee of the University, your data belongs to te University.
If you move your research, you can request to take a copy of your data with you.
UCSD/ USC court case – database
Usually the PI is responsible for the data – data governance
Through the course of your research your data needs to be stored securely, backed up, and maintained regularly. Once again this sounds like common sense, but you will be happy when you pay some attention to it. (e.g. when your laptop crashes or is stolen.).
I’m going to play a short video clip that has nothing to do with research data, but I think it perfectly captures the way we approach the storage aspect of data management.
https://www.youtube.com/watch?v=QyMgNZHtdk8
#1 rule of data storage – never just keep your data on one device. You are one dropped computer, one spilled glass of water, one unscrupulous thief away from losing all of your data. Every single day I go to Mom’s Café and see people leave their computers at their table while they go to the bathroom or grab a cup of coffee.
LOCKSS - There should never just be one copy of your data. Do you backup your data? Most important data management task. NO less than two, preferably three copies of research data.
How well are you covered against unexpected loss? Make sure that when disaster strikes, it isn’t a disaster
There are three options for
Personal computers and laptops – Convenient for storing your data while in use. Should not be used for storing master copies of your data.
Networked drives – Highly recommended. You can share data. Your data is stored in a single place and backed up regularly. Available to you from any place at any time. If using a department drive or Box stored securing thereby minimizing the risk of loss, theft, or authorized access. BEST!!!
External storage devices – thumb drives, flash drives, external hard drive. Cheap, easy to store and pass around. Feel better knowing it’s in your hands where you can see it. Not recommended for the long-term storage of your data.
3,2,1 – 3 copies in 2 physical locations, or more than one media.
1 TB free storage and an additional 50 GB if you are on a sponsored project.
Free!
Secure!
When you leave you can take a copy with you or create a new account
This is an example of social science research where the data are interview recording and transcripts.
Another area of data management that you will have to consider is data archiving.
Archiving is not the same thing as storage
Archiving adds additional value to your data.
Long-term preservation
Metadata
Sharable, usually through a persistent identifier
Makes data citable
There are lots of archiving options for your data. Some people choose to put their data on their website which is an option, but not a best practice.