SlideShare a Scribd company logo
Data, data everywhere!
Managing & organizing data
[Date]
[Name of Presenter]
[Library logo]
This work is licensed under CC BY 4.0 and is adapted from the
University of Michigan Library presentation at https://doi.org/10.7302/8684
Technology
Overview
● Live transcription is available
● Recording is on
● Remain muted during
presentation
● Use chat to share questions
and comments during the
presentation
● Evaluation at the end
● Email follow-up with slides &
recording
Learning Objectives
● Recognize why organizing your data is important for enabling high
quality research.
● Understand recommended practices for keeping your data organized.
● Identify ways to describe your data.
● Be able to experiment with practical strategies to start organizing your
data.
Have you ever lost
some data or notes
because of lack of
organization?
Poll
● Text
● Numerical
● Multimedia
● Models
● Software
What is data?
Raw Data
Analyzable
Data
Shareable
Data
Categories of data
● Observational: sensor readings,
telemetry, survey results, videos,
images
● Experimental: gene sequences,
chromatograms, magnetic field
readings, and spectroscopy
● Simulation: climate models,
economic models, and systems
engineering
● Compiled: text and data mining,
compiled database, systems
engineering, and 3D models
● Reference: gene sequence
databanks, census data, chemical
structures
Data Types & File Formats
Why is organizing and
managing data important?
1 rat heart
100s
of slices
100s of
slides
1000s of
image files
TIF TIF TIF TIF TIF TIF
100s of huge
images
3 postdocs
5-7 experiments
a week…
Data management for yourself
● Efficiently find your files
● Track your methods for
reproducibility
● Better version control of data
● Quality control
● Avoid data loss
● Document your data for your
own recollection,
accountability, and re-use
● Gain credibility and
recognition for your science
efforts through data sharing!
Data management for science
● Data is a valuable asset. It is expensive and time consuming to collect, take good care of it!
● Well-managed data:
○ improves quality, accuracy, and integrity of your research
○ maximizes the effective use of data
○ ensures appropriate use of data and information
○ strengthens the reliability of the research - promotes transparency, encourages
accountability, reduces bias and errors
○ ensures sustainability and accessibility allowing others to reproduce your findings
Research reproducibility
● Reproducibility is the ability for other researchers to reach similar
results when using the same methods and data.
○ Replicability is achieving similar results by conducting a new study with different
methods or approaches
● Accurate, comprehensive, and transparent reporting allows for
reproducibility.
Data management challenges
● Good data management take time and planning
● Researchers may lack knowledge about best practices in handling data
● General lack of incentives for doing good data management
● There can be a cost to managing data (human, technology, etc.)
Poor data management affects everyone
“MEDICARE PAYMENT ERRORS NEAR $20B” (CNN) December 2004. Miscoding and billing errors from
doctors and hospitals totaled $20 billion in FY 2003 (9.3% error rate). The error rate measured claims
that were paid despite being medically unnecessary, inadequately documented, or improperly coded.
This error rate actually was an improvement over the previous fiscal year (9.8% error rate).
“SOCIAL SECURITY DATA CAN TURN PEOPLE INTO THE LIVING DEAD” (NPR) August 2016. In 2011, an
audit found that about 1,000 people a month in the U.S. were marked deceased when they were very
much alive. Rona Lawson, who works in the Office of the Inspector General at the Social Security
Administration, says that number has gone down. It's now around 500 people a month. Lawson says 90
percent of the time, the cascade of misinformation starts with an input error by Social Security staff — a
regular mistake on a regular office day that just happens to kill a person off, at least on paper."
The climate scientists at the centre of a media
storm over leaked emails were cleared of
accusations that they manipulated their
results and silenced critics, but a review found
they had failed to be open enough about their
work.
[Image removed for copyright purposes. Alt text:
Screenshot of article from The Guardian UK titled
"Climategate scientists cleared of manipulating
data on global warming”]
“had published duplicate
pictures in several cases
and had repeatedly failed
to exert due diligence in
organising her area of
study over a long period
of time.”
http://retractionwatch.com/2016/11/02/leading-diabetes-
researcher-acted-negligently-probe-concludes/
[Image removed for copyright purposes. Alt text:
Screenshot of Retraction Watch announcement
titled "Leading diabetes researcher acted
negligently, probe concludes”]
Don't end up here!
NEJM paper on sleep apnea retracted when original data can’t be found
● Multiple errors in
table
● Did not alter
conclusions in
article
● BUT, could not
locate primary data
[Image removed for copyright purposes. Alt text:
Screenshot of a Retraction Watch announcement
titled "NEJM paper on sleep apnea retracted when
original data can't be found”]
Data management
and documentation
could have
prevented these
problems!
How can I manage &
organize my data?
● Communicate thoroughly
○ Provide training
○ Make documentation findable
○ Use variety of mediums (email, Slack, lab manual)
○ Regularly check-in and/or remind all contributors
Keys to success
● Maintain good documentation
○ Readme files
○ Data management plan (DMP)
If you can, assign
one person to be
responsible
Organizing tactics
1. Identify storage solution(s)
2. Establish directory/folder structures
3. Develop file naming convention(s)
4. Decide on file formats
1. Identify storage solution(s)
● May have multiple storage locations
● Investigate ITS data storage finder:
Data Storage Finder / U-M Information and Technology Services
● Set up appropriate access/permissions
Document!
2. Establish directory/folder structures
● Organize directories hierarchically
● Group files of similar information together in a single directory
● Name directories after aspects of the project rather than individual
researchers
● Separate ongoing and completed work
● Once you have decided on a directory structure, follow it consistently
and audit it periodically
Document!
Organising
— UK Data
Service
Good data practices -
Dryad
[Image removed for copyright purposes. Alt text: A
screenshot showing two ways to organize folders
and files. The first is Organized by file type with one
folder for data, both processed and raw, and
another folder for results. The second way is
organized by analysis where there are two folders,
figure 1 and figure 2, and within each of those
folders are the corresponding data and results
folders.]
[Image removed for copyright
purposes. Alt text: A screenshot of a
series of folders showing how they
are organized. The top level includes
a Data folder and a Documentation
folder. Within the Data folder, there
are folders for Databases, Images,
Models, Sound, and Text. Within the
Documentation folder there are
folders for Consent Forms,
Information Sheets, and
Methodology.]
3. Develop file naming convention(s)
File names should:
● embody the content of the file
● have intuitive (non-cryptic) names where possible
● be extensible
● be unique, where possible and practical
● not use special characters – restrict file names to
numbers, letters, and underscores
● be named using consistent, documentable rules
Document!
May need to include:
● versioning
● multiple
conventions
Implement version control (versioning)
Piled Higher and Deeper by Jorge Cham - PhDComics
[Comic removed for copyright purposes]
ISO 8601 - Formatting Dates
Standard way to format date and time - extremely helpful for file naming
conventions
YYYY-MM-DD
YYYYMMDD
ISO 8601 — Date and time format
Examples
AtherRat_012_056_mb_0423_raw.csv
AtherRat = experiment name
012 = experiment number
056 = sample number
mb = stain used, methylene blue
0423 = 2-digit coordinates of image (4
across, 23 down)
Raw = data stage
File naming and organization of data
- University of Ottawa Library
[Image removed for copyright purposes. Alt text:
Chart showing a filename
Project_YYYMMDD_ContentDescription_Version.e
xt. The parts of the filename are defined as the
Project name, date, description of file content,
and version information. Underscores are used
to separate the different parts of the filename.]
4. Decide on file formats
Whenever possible, select file formats that are:
● non-proprietary
● unencrypted
● uncompressed
● in common usage by the research community
● adherent to an open, documented standard
● interoperable among diverse platforms and applications
● royalty-free and without intellectual property restrictions
● developed and maintained by an established open standards organization
Consider
instrument/device
settings
Document!
Recommended file formats
Audio: WAVE, AIFF, MP3, MXF
Containers: TAR, GZIP, ZIP
Databases: XML, CSV
Statistics: ASCII, DTA, POR, SAS,
SAV
Still images: TIFF, JPEG 2000, PNG,
GIF
Tabular data: CSV
Text (documentation, scripts):
XML, PDF/A, Plain Text (ASCII,
UTF-8)
Video: MOV, MPEG, AVI, MXF
Recommended Formats Statement – table of contents | Resources (Preservation, Library of Congress)
How should I describe
my data?
Documentation
Good data documentation helps ensure accurate reporting of data and
methods:
● Project level - describes the procedures and method use for data
collection and analysis (workflow, protocols, instruments, etc.)
● Metadata - data about your data
● Data dictionaries - describes the variables in a dataset
● README files - dataset description, guide to files and technology
Workflows
The meaning of data depends
on the context of how it was
collected.
[Decorative workflow image
removed for copyright purposes.]
Internal workflow
● How and when will the work be done?
● Will data be reviewed for quality?
● Who manages the entire process?
What is metadata?
It's the “data about data.” Structured (may follow documentation standards)
information that makes it easier to retrieve, use, or manage data. It can include
things like:
● Dates, times, locations
● Hardware/software information and parameters
● Methodology
● Creators
● Copyright
● Formats
Perkel, J. M. (2023). How to make your scientific data accessible,
discoverable and useful. Nature, 618(7967), 1098–1099.
https://doi.org/10.1038/d41586-023-01929-7
[Image removed for copyright purposes. Alt text:
Screenshot of a section of the article that says:
"Nature asked data scientists about their best
practices for publishing usable, high-quality data -
here's what they said. Craft Metadata. If there's
one thing scientists can add to maximize their
data's value, it's "metadata, metadata, metadata",
says environmental scientist Patricia Soprano at
Michigan State University in East Lansing.”]
Share in
chat
What are we looking at in this
table?
What questions or concerns
arise?
Document your variable names
● Intuitive / meaningful variable names e.g. study_id
● What do variable names mean?
● What does each variable contain?
● Are there a limited set of possible values?
Name Field Type Description Possible values Units
study_id text Unique ID of
study
8-digit number
date_enrolled date Initial subject
enrollment
date
Date in format YYYY-MM-
DD; All dates later than
2011-09-01
weight integer Weight of
subject
lbs
Data Dictionaries
Data dictionaries provide detailed descriptions of the data in a dataset.
They help provide context about the structure and content of data. They
can also guide the process for collection and use of the data.
● List of all data objects
● Description of data elements (size, type, classification, etc.)
● Relationships to other data
● Variables and coding information
● Creation date
Public Use Microdata Sample Documentation (US Census)
https://www.usgs.gov/centers/eros/science/landsat-data-dictionary
What do we need to know to use your data?
● Where to find it
● How to access it
● What can it be used for?
● Known problems, inconsistencies, limitations
● Collection methods, units of measure, variable names
● Data integrity
● Ethical/privacy restrictions
● Licensing
● Who to cite
README files
Source: Quick & Dirty Data Management
README: Recommended practices
● Create 1 readme file for each data file/dataset
● Name the readme so that it’s easily associated with the datafile(s) it
describes
● Write your readme document as a plain text file
● Format multiple readme files identically (Tip: use a template - create or
find one!)
● Follow the conventions for your discipline
Source: Quick & Dirty Data Management
● Pick 1 project, implement 1 new tactic
● Choose 1 aspect of a file naming convention and apply it to your files
● Plan out folder names and hierarchy before starting a new project
● Do a scan to figure out where all your data files are & document everything in a readme file
● Review current resources/tools for where you can gather metadata and documentation
without changing your workflow
● Weave in recommended practices where you can; practice leads to improvement!
● Get an organization accountability buddy & check in occasionally
Ways to get started
Consider your
situation and goals!
Check yourself!
1. Can you easily locate & understand the raw data?
2. Can you connect different types of related data you collected?
3. If a file gets misplaced, can you put it back in the correct folder?
4. Are your naming conventions consistent with others on your team?
5. If another researcher were to ask you for a copy of your data 5 years
after the close of your project, would you be able to easily find it and
send it to them? Could all the members of your research team find it?
6. If a researcher were to receive a copy of your data, would they be able
to use it without asking you too many questions?
What’s worked
for you?
(Tools, systems, getting started, etc.)
Share in
chat
Resources
● U-M Library data-related guides:
○ Social Sciences
○ Engineering
○ Health Sciences
○ Research Data (General)
● U-M Library Data Services
● Reach out with questions or
request a consultation:
○ U-M Library Research Data Services:
researchdataservices@umich.edu
○ Taubman Health Sciences Library
Data Team:
THLResearchDataCore@umich.edu
[Insert advertisement for additional data workshops.
Include URL and/or QR code for registration.]
Thanks!
Questions?
[Presenter name, email]
Parts of this presentation were adapted from presentations by:
● NYU Langone Health RDM Teaching Toolkit slides by
Kevin Read & Alisa Surkis.
● Marisa Conte, from when she was a THSL informationist.
● DataONE Education Module: Data Management.

More Related Content

Similar to ManagingOrganizingData_ReusableSlides.ppt

Meeting the NSF DMP Requirement June 13, 2012
Meeting the NSF DMP Requirement June 13, 2012Meeting the NSF DMP Requirement June 13, 2012
Meeting the NSF DMP Requirement June 13, 2012
IUPUI
 
Data Extraction
Data ExtractionData Extraction
RDM for Librarians
RDM for LibrariansRDM for Librarians
RDM for Librarians
Marieke Guy
 
Research Data Management and Sharing for the Social Sciences and Humanities
Research Data Management and Sharing for the Social Sciences and HumanitiesResearch Data Management and Sharing for the Social Sciences and Humanities
Research Data Management and Sharing for the Social Sciences and Humanities
Rebekah Cummings
 
Johnston - How to Curate Research Data
Johnston - How to Curate Research DataJohnston - How to Curate Research Data
Johnston - How to Curate Research Data
National Information Standards Organization (NISO)
 
Data Management Planning for researchers
Data Management Planning for researchersData Management Planning for researchers
Data Management Planning for researchers
Sarah Jones
 
Research Lifecycles and RDM
Research Lifecycles and RDMResearch Lifecycles and RDM
Research Lifecycles and RDM
Marieke Guy
 
Introduction to RDM for trainee physicians
Introduction to RDM for trainee physiciansIntroduction to RDM for trainee physicians
Introduction to RDM for trainee physicians
Historic Environment Scotland
 
Research data management workshop april12 2016
Research data management workshop april12 2016 Research data management workshop april12 2016
Research data management workshop april12 2016
Rebecca Raworth, MLIS
 
Research data management workshop April 2016
Research data management workshop April 2016Research data management workshop April 2016
Research data management workshop April 2016
Rebecca Raworth, MLIS
 
Make your data great now
Make your data great nowMake your data great now
Make your data great now
Daniel JACOB
 
RDM for trainee physicians
RDM for trainee physiciansRDM for trainee physicians
RDM for trainee physicians
Historic Environment Scotland
 
Data management plans (dmp) for nsf
Data management plans (dmp) for nsfData management plans (dmp) for nsf
Data management plans (dmp) for nsf
Brad Houston
 
Data management plans (dmp) for nsf
Data management plans (dmp) for nsfData management plans (dmp) for nsf
Data management plans (dmp) for nsf
Brad Houston
 
Organising and Documenting Data
Organising and Documenting DataOrganising and Documenting Data
Organising and Documenting Data
EDINA, University of Edinburgh
 
Research data management : [part of] PROOF course Finding and controlling sci...
Research data management : [part of] PROOF course Finding and controlling sci...Research data management : [part of] PROOF course Finding and controlling sci...
Research data management : [part of] PROOF course Finding and controlling sci...
Leon Osinski
 
Unit_8_Data_processing,_analysis_and_presentation_and_Application (1).pptx
Unit_8_Data_processing,_analysis_and_presentation_and_Application (1).pptxUnit_8_Data_processing,_analysis_and_presentation_and_Application (1).pptx
Unit_8_Data_processing,_analysis_and_presentation_and_Application (1).pptx
tesfkeb
 
Intro to RDM
Intro to RDMIntro to RDM
Intro to RDM
Sarah Jones
 
Data management plans
Data management plansData management plans
Data management plans
Brad Houston
 
Data Management for Research (New Faculty Orientation)
Data Management for Research (New Faculty Orientation)Data Management for Research (New Faculty Orientation)
Data Management for Research (New Faculty Orientation)
aaroncollie
 

Similar to ManagingOrganizingData_ReusableSlides.ppt (20)

Meeting the NSF DMP Requirement June 13, 2012
Meeting the NSF DMP Requirement June 13, 2012Meeting the NSF DMP Requirement June 13, 2012
Meeting the NSF DMP Requirement June 13, 2012
 
Data Extraction
Data ExtractionData Extraction
Data Extraction
 
RDM for Librarians
RDM for LibrariansRDM for Librarians
RDM for Librarians
 
Research Data Management and Sharing for the Social Sciences and Humanities
Research Data Management and Sharing for the Social Sciences and HumanitiesResearch Data Management and Sharing for the Social Sciences and Humanities
Research Data Management and Sharing for the Social Sciences and Humanities
 
Johnston - How to Curate Research Data
Johnston - How to Curate Research DataJohnston - How to Curate Research Data
Johnston - How to Curate Research Data
 
Data Management Planning for researchers
Data Management Planning for researchersData Management Planning for researchers
Data Management Planning for researchers
 
Research Lifecycles and RDM
Research Lifecycles and RDMResearch Lifecycles and RDM
Research Lifecycles and RDM
 
Introduction to RDM for trainee physicians
Introduction to RDM for trainee physiciansIntroduction to RDM for trainee physicians
Introduction to RDM for trainee physicians
 
Research data management workshop april12 2016
Research data management workshop april12 2016 Research data management workshop april12 2016
Research data management workshop april12 2016
 
Research data management workshop April 2016
Research data management workshop April 2016Research data management workshop April 2016
Research data management workshop April 2016
 
Make your data great now
Make your data great nowMake your data great now
Make your data great now
 
RDM for trainee physicians
RDM for trainee physiciansRDM for trainee physicians
RDM for trainee physicians
 
Data management plans (dmp) for nsf
Data management plans (dmp) for nsfData management plans (dmp) for nsf
Data management plans (dmp) for nsf
 
Data management plans (dmp) for nsf
Data management plans (dmp) for nsfData management plans (dmp) for nsf
Data management plans (dmp) for nsf
 
Organising and Documenting Data
Organising and Documenting DataOrganising and Documenting Data
Organising and Documenting Data
 
Research data management : [part of] PROOF course Finding and controlling sci...
Research data management : [part of] PROOF course Finding and controlling sci...Research data management : [part of] PROOF course Finding and controlling sci...
Research data management : [part of] PROOF course Finding and controlling sci...
 
Unit_8_Data_processing,_analysis_and_presentation_and_Application (1).pptx
Unit_8_Data_processing,_analysis_and_presentation_and_Application (1).pptxUnit_8_Data_processing,_analysis_and_presentation_and_Application (1).pptx
Unit_8_Data_processing,_analysis_and_presentation_and_Application (1).pptx
 
Intro to RDM
Intro to RDMIntro to RDM
Intro to RDM
 
Data management plans
Data management plansData management plans
Data management plans
 
Data Management for Research (New Faculty Orientation)
Data Management for Research (New Faculty Orientation)Data Management for Research (New Faculty Orientation)
Data Management for Research (New Faculty Orientation)
 

Recently uploaded

AJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdfAJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR
 
Immersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths ForwardImmersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths Forward
Leonel Morgado
 
Methods of grain storage Structures in India.pdf
Methods of grain storage Structures in India.pdfMethods of grain storage Structures in India.pdf
Methods of grain storage Structures in India.pdf
PirithiRaju
 
Pests of Storage_Identification_Dr.UPR.pdf
Pests of Storage_Identification_Dr.UPR.pdfPests of Storage_Identification_Dr.UPR.pdf
Pests of Storage_Identification_Dr.UPR.pdf
PirithiRaju
 
Tissue fluids_etiology_volume regulation_pressure.pptx
Tissue fluids_etiology_volume regulation_pressure.pptxTissue fluids_etiology_volume regulation_pressure.pptx
Tissue fluids_etiology_volume regulation_pressure.pptx
muralinath2
 
Alternate Wetting and Drying - Climate Smart Agriculture
Alternate Wetting and Drying - Climate Smart AgricultureAlternate Wetting and Drying - Climate Smart Agriculture
Alternate Wetting and Drying - Climate Smart Agriculture
International Food Policy Research Institute- South Asia Office
 
Microbiology of Central Nervous System INFECTIONS.pdf
Microbiology of Central Nervous System INFECTIONS.pdfMicrobiology of Central Nervous System INFECTIONS.pdf
Microbiology of Central Nervous System INFECTIONS.pdf
sammy700571
 
CLASS 12th CHEMISTRY SOLID STATE ppt (Animated)
CLASS 12th CHEMISTRY SOLID STATE ppt (Animated)CLASS 12th CHEMISTRY SOLID STATE ppt (Animated)
CLASS 12th CHEMISTRY SOLID STATE ppt (Animated)
eitps1506
 
The cost of acquiring information by natural selection
The cost of acquiring information by natural selectionThe cost of acquiring information by natural selection
The cost of acquiring information by natural selection
Carl Bergstrom
 
Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...
Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...
Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...
Travis Hills MN
 
Farming systems analysis: what have we learnt?.pptx
Farming systems analysis: what have we learnt?.pptxFarming systems analysis: what have we learnt?.pptx
Farming systems analysis: what have we learnt?.pptx
Frédéric Baudron
 
fermented food science of sauerkraut.pptx
fermented food science of sauerkraut.pptxfermented food science of sauerkraut.pptx
fermented food science of sauerkraut.pptx
ananya23nair
 
Applied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdfApplied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdf
University of Hertfordshire
 
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
Sérgio Sacani
 
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Leonel Morgado
 
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
Sérgio Sacani
 
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
hozt8xgk
 
11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf
PirithiRaju
 
Gadgets for management of stored product pests_Dr.UPR.pdf
Gadgets for management of stored product pests_Dr.UPR.pdfGadgets for management of stored product pests_Dr.UPR.pdf
Gadgets for management of stored product pests_Dr.UPR.pdf
PirithiRaju
 
The binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defectsThe binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defects
Sérgio Sacani
 

Recently uploaded (20)

AJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdfAJAY KUMAR NIET GreNo Guava Project File.pdf
AJAY KUMAR NIET GreNo Guava Project File.pdf
 
Immersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths ForwardImmersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths Forward
 
Methods of grain storage Structures in India.pdf
Methods of grain storage Structures in India.pdfMethods of grain storage Structures in India.pdf
Methods of grain storage Structures in India.pdf
 
Pests of Storage_Identification_Dr.UPR.pdf
Pests of Storage_Identification_Dr.UPR.pdfPests of Storage_Identification_Dr.UPR.pdf
Pests of Storage_Identification_Dr.UPR.pdf
 
Tissue fluids_etiology_volume regulation_pressure.pptx
Tissue fluids_etiology_volume regulation_pressure.pptxTissue fluids_etiology_volume regulation_pressure.pptx
Tissue fluids_etiology_volume regulation_pressure.pptx
 
Alternate Wetting and Drying - Climate Smart Agriculture
Alternate Wetting and Drying - Climate Smart AgricultureAlternate Wetting and Drying - Climate Smart Agriculture
Alternate Wetting and Drying - Climate Smart Agriculture
 
Microbiology of Central Nervous System INFECTIONS.pdf
Microbiology of Central Nervous System INFECTIONS.pdfMicrobiology of Central Nervous System INFECTIONS.pdf
Microbiology of Central Nervous System INFECTIONS.pdf
 
CLASS 12th CHEMISTRY SOLID STATE ppt (Animated)
CLASS 12th CHEMISTRY SOLID STATE ppt (Animated)CLASS 12th CHEMISTRY SOLID STATE ppt (Animated)
CLASS 12th CHEMISTRY SOLID STATE ppt (Animated)
 
The cost of acquiring information by natural selection
The cost of acquiring information by natural selectionThe cost of acquiring information by natural selection
The cost of acquiring information by natural selection
 
Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...
Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...
Travis Hills of MN is Making Clean Water Accessible to All Through High Flux ...
 
Farming systems analysis: what have we learnt?.pptx
Farming systems analysis: what have we learnt?.pptxFarming systems analysis: what have we learnt?.pptx
Farming systems analysis: what have we learnt?.pptx
 
fermented food science of sauerkraut.pptx
fermented food science of sauerkraut.pptxfermented food science of sauerkraut.pptx
fermented food science of sauerkraut.pptx
 
Applied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdfApplied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdf
 
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...
 
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
Describing and Interpreting an Immersive Learning Case with the Immersion Cub...
 
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
EWOCS-I: The catalog of X-ray sources in Westerlund 1 from the Extended Weste...
 
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
快速办理(UAM毕业证书)马德里自治大学毕业证学位证一模一样
 
11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf11.1 Role of physical biological in deterioration of grains.pdf
11.1 Role of physical biological in deterioration of grains.pdf
 
Gadgets for management of stored product pests_Dr.UPR.pdf
Gadgets for management of stored product pests_Dr.UPR.pdfGadgets for management of stored product pests_Dr.UPR.pdf
Gadgets for management of stored product pests_Dr.UPR.pdf
 
The binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defectsThe binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defects
 

ManagingOrganizingData_ReusableSlides.ppt

  • 1. Data, data everywhere! Managing & organizing data [Date] [Name of Presenter] [Library logo] This work is licensed under CC BY 4.0 and is adapted from the University of Michigan Library presentation at https://doi.org/10.7302/8684
  • 2. Technology Overview ● Live transcription is available ● Recording is on ● Remain muted during presentation ● Use chat to share questions and comments during the presentation ● Evaluation at the end ● Email follow-up with slides & recording
  • 3. Learning Objectives ● Recognize why organizing your data is important for enabling high quality research. ● Understand recommended practices for keeping your data organized. ● Identify ways to describe your data. ● Be able to experiment with practical strategies to start organizing your data.
  • 4. Have you ever lost some data or notes because of lack of organization? Poll
  • 5. ● Text ● Numerical ● Multimedia ● Models ● Software What is data? Raw Data Analyzable Data Shareable Data
  • 6. Categories of data ● Observational: sensor readings, telemetry, survey results, videos, images ● Experimental: gene sequences, chromatograms, magnetic field readings, and spectroscopy ● Simulation: climate models, economic models, and systems engineering ● Compiled: text and data mining, compiled database, systems engineering, and 3D models ● Reference: gene sequence databanks, census data, chemical structures Data Types & File Formats
  • 7. Why is organizing and managing data important?
  • 8. 1 rat heart 100s of slices 100s of slides 1000s of image files TIF TIF TIF TIF TIF TIF 100s of huge images 3 postdocs 5-7 experiments a week…
  • 9. Data management for yourself ● Efficiently find your files ● Track your methods for reproducibility ● Better version control of data ● Quality control ● Avoid data loss ● Document your data for your own recollection, accountability, and re-use ● Gain credibility and recognition for your science efforts through data sharing!
  • 10. Data management for science ● Data is a valuable asset. It is expensive and time consuming to collect, take good care of it! ● Well-managed data: ○ improves quality, accuracy, and integrity of your research ○ maximizes the effective use of data ○ ensures appropriate use of data and information ○ strengthens the reliability of the research - promotes transparency, encourages accountability, reduces bias and errors ○ ensures sustainability and accessibility allowing others to reproduce your findings
  • 11. Research reproducibility ● Reproducibility is the ability for other researchers to reach similar results when using the same methods and data. ○ Replicability is achieving similar results by conducting a new study with different methods or approaches ● Accurate, comprehensive, and transparent reporting allows for reproducibility.
  • 12. Data management challenges ● Good data management take time and planning ● Researchers may lack knowledge about best practices in handling data ● General lack of incentives for doing good data management ● There can be a cost to managing data (human, technology, etc.)
  • 13. Poor data management affects everyone “MEDICARE PAYMENT ERRORS NEAR $20B” (CNN) December 2004. Miscoding and billing errors from doctors and hospitals totaled $20 billion in FY 2003 (9.3% error rate). The error rate measured claims that were paid despite being medically unnecessary, inadequately documented, or improperly coded. This error rate actually was an improvement over the previous fiscal year (9.8% error rate). “SOCIAL SECURITY DATA CAN TURN PEOPLE INTO THE LIVING DEAD” (NPR) August 2016. In 2011, an audit found that about 1,000 people a month in the U.S. were marked deceased when they were very much alive. Rona Lawson, who works in the Office of the Inspector General at the Social Security Administration, says that number has gone down. It's now around 500 people a month. Lawson says 90 percent of the time, the cascade of misinformation starts with an input error by Social Security staff — a regular mistake on a regular office day that just happens to kill a person off, at least on paper."
  • 14. The climate scientists at the centre of a media storm over leaked emails were cleared of accusations that they manipulated their results and silenced critics, but a review found they had failed to be open enough about their work. [Image removed for copyright purposes. Alt text: Screenshot of article from The Guardian UK titled "Climategate scientists cleared of manipulating data on global warming”]
  • 15. “had published duplicate pictures in several cases and had repeatedly failed to exert due diligence in organising her area of study over a long period of time.” http://retractionwatch.com/2016/11/02/leading-diabetes- researcher-acted-negligently-probe-concludes/ [Image removed for copyright purposes. Alt text: Screenshot of Retraction Watch announcement titled "Leading diabetes researcher acted negligently, probe concludes”]
  • 16. Don't end up here! NEJM paper on sleep apnea retracted when original data can’t be found ● Multiple errors in table ● Did not alter conclusions in article ● BUT, could not locate primary data [Image removed for copyright purposes. Alt text: Screenshot of a Retraction Watch announcement titled "NEJM paper on sleep apnea retracted when original data can't be found”]
  • 17. Data management and documentation could have prevented these problems!
  • 18. How can I manage & organize my data?
  • 19. ● Communicate thoroughly ○ Provide training ○ Make documentation findable ○ Use variety of mediums (email, Slack, lab manual) ○ Regularly check-in and/or remind all contributors Keys to success ● Maintain good documentation ○ Readme files ○ Data management plan (DMP) If you can, assign one person to be responsible
  • 20. Organizing tactics 1. Identify storage solution(s) 2. Establish directory/folder structures 3. Develop file naming convention(s) 4. Decide on file formats
  • 21. 1. Identify storage solution(s) ● May have multiple storage locations ● Investigate ITS data storage finder: Data Storage Finder / U-M Information and Technology Services ● Set up appropriate access/permissions Document!
  • 22. 2. Establish directory/folder structures ● Organize directories hierarchically ● Group files of similar information together in a single directory ● Name directories after aspects of the project rather than individual researchers ● Separate ongoing and completed work ● Once you have decided on a directory structure, follow it consistently and audit it periodically Document!
  • 23. Organising — UK Data Service Good data practices - Dryad [Image removed for copyright purposes. Alt text: A screenshot showing two ways to organize folders and files. The first is Organized by file type with one folder for data, both processed and raw, and another folder for results. The second way is organized by analysis where there are two folders, figure 1 and figure 2, and within each of those folders are the corresponding data and results folders.] [Image removed for copyright purposes. Alt text: A screenshot of a series of folders showing how they are organized. The top level includes a Data folder and a Documentation folder. Within the Data folder, there are folders for Databases, Images, Models, Sound, and Text. Within the Documentation folder there are folders for Consent Forms, Information Sheets, and Methodology.]
  • 24. 3. Develop file naming convention(s) File names should: ● embody the content of the file ● have intuitive (non-cryptic) names where possible ● be extensible ● be unique, where possible and practical ● not use special characters – restrict file names to numbers, letters, and underscores ● be named using consistent, documentable rules Document! May need to include: ● versioning ● multiple conventions
  • 25. Implement version control (versioning) Piled Higher and Deeper by Jorge Cham - PhDComics [Comic removed for copyright purposes]
  • 26. ISO 8601 - Formatting Dates Standard way to format date and time - extremely helpful for file naming conventions YYYY-MM-DD YYYYMMDD ISO 8601 — Date and time format
  • 27. Examples AtherRat_012_056_mb_0423_raw.csv AtherRat = experiment name 012 = experiment number 056 = sample number mb = stain used, methylene blue 0423 = 2-digit coordinates of image (4 across, 23 down) Raw = data stage File naming and organization of data - University of Ottawa Library [Image removed for copyright purposes. Alt text: Chart showing a filename Project_YYYMMDD_ContentDescription_Version.e xt. The parts of the filename are defined as the Project name, date, description of file content, and version information. Underscores are used to separate the different parts of the filename.]
  • 28. 4. Decide on file formats Whenever possible, select file formats that are: ● non-proprietary ● unencrypted ● uncompressed ● in common usage by the research community ● adherent to an open, documented standard ● interoperable among diverse platforms and applications ● royalty-free and without intellectual property restrictions ● developed and maintained by an established open standards organization Consider instrument/device settings Document!
  • 29. Recommended file formats Audio: WAVE, AIFF, MP3, MXF Containers: TAR, GZIP, ZIP Databases: XML, CSV Statistics: ASCII, DTA, POR, SAS, SAV Still images: TIFF, JPEG 2000, PNG, GIF Tabular data: CSV Text (documentation, scripts): XML, PDF/A, Plain Text (ASCII, UTF-8) Video: MOV, MPEG, AVI, MXF Recommended Formats Statement – table of contents | Resources (Preservation, Library of Congress)
  • 30. How should I describe my data?
  • 31. Documentation Good data documentation helps ensure accurate reporting of data and methods: ● Project level - describes the procedures and method use for data collection and analysis (workflow, protocols, instruments, etc.) ● Metadata - data about your data ● Data dictionaries - describes the variables in a dataset ● README files - dataset description, guide to files and technology
  • 32. Workflows The meaning of data depends on the context of how it was collected. [Decorative workflow image removed for copyright purposes.]
  • 33. Internal workflow ● How and when will the work be done? ● Will data be reviewed for quality? ● Who manages the entire process?
  • 34. What is metadata? It's the “data about data.” Structured (may follow documentation standards) information that makes it easier to retrieve, use, or manage data. It can include things like: ● Dates, times, locations ● Hardware/software information and parameters ● Methodology ● Creators ● Copyright ● Formats
  • 35. Perkel, J. M. (2023). How to make your scientific data accessible, discoverable and useful. Nature, 618(7967), 1098–1099. https://doi.org/10.1038/d41586-023-01929-7 [Image removed for copyright purposes. Alt text: Screenshot of a section of the article that says: "Nature asked data scientists about their best practices for publishing usable, high-quality data - here's what they said. Craft Metadata. If there's one thing scientists can add to maximize their data's value, it's "metadata, metadata, metadata", says environmental scientist Patricia Soprano at Michigan State University in East Lansing.”]
  • 36. Share in chat What are we looking at in this table? What questions or concerns arise?
  • 37. Document your variable names ● Intuitive / meaningful variable names e.g. study_id ● What do variable names mean? ● What does each variable contain? ● Are there a limited set of possible values? Name Field Type Description Possible values Units study_id text Unique ID of study 8-digit number date_enrolled date Initial subject enrollment date Date in format YYYY-MM- DD; All dates later than 2011-09-01 weight integer Weight of subject lbs
  • 38. Data Dictionaries Data dictionaries provide detailed descriptions of the data in a dataset. They help provide context about the structure and content of data. They can also guide the process for collection and use of the data. ● List of all data objects ● Description of data elements (size, type, classification, etc.) ● Relationships to other data ● Variables and coding information ● Creation date
  • 39. Public Use Microdata Sample Documentation (US Census)
  • 41. What do we need to know to use your data? ● Where to find it ● How to access it ● What can it be used for? ● Known problems, inconsistencies, limitations ● Collection methods, units of measure, variable names ● Data integrity ● Ethical/privacy restrictions ● Licensing ● Who to cite README files Source: Quick & Dirty Data Management
  • 42. README: Recommended practices ● Create 1 readme file for each data file/dataset ● Name the readme so that it’s easily associated with the datafile(s) it describes ● Write your readme document as a plain text file ● Format multiple readme files identically (Tip: use a template - create or find one!) ● Follow the conventions for your discipline Source: Quick & Dirty Data Management
  • 43.
  • 44. ● Pick 1 project, implement 1 new tactic ● Choose 1 aspect of a file naming convention and apply it to your files ● Plan out folder names and hierarchy before starting a new project ● Do a scan to figure out where all your data files are & document everything in a readme file ● Review current resources/tools for where you can gather metadata and documentation without changing your workflow ● Weave in recommended practices where you can; practice leads to improvement! ● Get an organization accountability buddy & check in occasionally Ways to get started Consider your situation and goals!
  • 45. Check yourself! 1. Can you easily locate & understand the raw data? 2. Can you connect different types of related data you collected? 3. If a file gets misplaced, can you put it back in the correct folder? 4. Are your naming conventions consistent with others on your team? 5. If another researcher were to ask you for a copy of your data 5 years after the close of your project, would you be able to easily find it and send it to them? Could all the members of your research team find it? 6. If a researcher were to receive a copy of your data, would they be able to use it without asking you too many questions?
  • 46. What’s worked for you? (Tools, systems, getting started, etc.) Share in chat
  • 47. Resources ● U-M Library data-related guides: ○ Social Sciences ○ Engineering ○ Health Sciences ○ Research Data (General) ● U-M Library Data Services ● Reach out with questions or request a consultation: ○ U-M Library Research Data Services: researchdataservices@umich.edu ○ Taubman Health Sciences Library Data Team: THLResearchDataCore@umich.edu
  • 48. [Insert advertisement for additional data workshops. Include URL and/or QR code for registration.]
  • 49. Thanks! Questions? [Presenter name, email] Parts of this presentation were adapted from presentations by: ● NYU Langone Health RDM Teaching Toolkit slides by Kevin Read & Alisa Surkis. ● Marisa Conte, from when she was a THSL informationist. ● DataONE Education Module: Data Management.

Editor's Notes

  1. Have you ever lost some data or notes because of lack of organization? Yes No Maybe?
  2. 8