SlideShare a Scribd company logo
1 of 35
Data
Organization
C. Tobin Magle, PhD
Feb. 28, 2017
10:00-11:30 a.m.
Morgan Library Computer
Classroom 175
*inspired by content from Data Carpentry
Hypothesis Data
Experimental
design
ResultsArticle
Data
Management
Plans
The research cycle
Main topics
• Hierarchical organization
• Folders in folders
• Open Science Framework
• File naming
• Human readability
• Machine readability
• “Tidy” data in spreadsheets
Folder systems
• Organize your data
hierarchically
• Identify ways to divide your data
into categories (Attributes)
• Top level organization is the
most important attribute
Hierarchical Organization
Putting your files into a folder system
my_project
Data Notes protocols manuscripts
Paper1
Figures
Text
References
Paper2
Questions to ask
• What kinds of files are there? (See data inventory)
• How could you group them?
• Project?
• Time?
• Location?
• File type?
• What are the most important attributes?
Example: Lou the first year
Lou is a first year graduate student working on a project in a
biomedical research laboratory. He’s trying to decipher data
left by a former post doc as a start for his thesis project. For
one year, the postdoc recorded weight daily and cytokine
levels monthly from 16 mice. Half were infected with a
parasite, half were treated with saline.
• List the attributes of his project?
• How would you rank these attributes?
Example: Lou the first year
Lou is a first year graduate student working on a project in a
biomedical research laboratory. He’s trying to decipher data
left by a former post doc as a start for his thesis project. For
one year, the postdoc recorded weight daily and cytokine
levels monthly from 16 mice. Half were infected with a
parasite, half were treated with saline.
• List the attributes of his project?
• How would you rank these attributes?
Attributes
• Time
Example: Lou the first year
Lou is a first year graduate student working on a project in a
biomedical research laboratory. He’s trying to decipher data
left by a former post doc as a start for his thesis project. For
one year, the postdoc recorded weight daily and cytokine
levels monthly from 16 mice. Half were infected with a
parasite, half were treated with saline.
• List the attributes of his project?
• How would you rank these attributes?
Attributes
• Time
• Infection Status
Example: Lou the first year
Lou is a first year graduate student working on a project in a
biomedical research laboratory. He’s trying to decipher data
left by a former post doc as a start for his thesis project. For
one year, the postdoc recorded weight daily and cytokine
levels monthly from 16 mice. Half were infected with a
parasite, half were treated with saline.
• List the attributes of his project?
• How would you rank these attributes?
Attributes
• Time
• Infection Status
• Data Type
Exercise: Organize files
• Download Lou’s files (look in the README file for insight)
• http://tinyurl.com/hvna4mg
• Create a hierarchical folder structure for Lou
• Drag his files into the correct folders
• Fix Lou’s README
• Bonus: think about how you’d organize your data.
Tool: Open Science Framework
• Components
• Add-ons
• Contributors
• Wiki
http://help.osf.io/m/collaborating/l/524109-using-the-wiki http://www.slideshare.net/DuraSpace/121014
-slides-roadmap-to-the-future-of-share
Organization tips
• Be consistent
• One directory per project
• Separate components for
• Raw data
• Processed data
• Code
• Output
• Make raw data read-only
• Make README files
http://help.osf.io/m/60347/l/611391-organizing-files
Components
• “Subprojects”
• Separate privacy settings,
contributors, wiki, add-ons, and
files.
• Examples:
• Different projects:
https://osf.io/82fba/
• Clinical: https://osf.io/gq4mz/
• Manuscript: https://osf.io/if7ug/
• Collaboration: https://osf.io/ezcuj/
Demo: Getting started with OSF
1. Create a project
2. Add components
3. Add files
Don’t panic!
• Just try something
• There’s no right answer
• Be consistent
• Write a README.txt file
http://4vector.com/i/free-vector-don-t-panic-clip-
art_103946_Dont_Panic_clip_art_hight.png
File naming conventions
Make file name both human and machine readable.
Use descriptive names
• Bad name: file.txt
• Ok name: 05-07-2016-mouse-data.txt
• Good name: 2016-05-07-mouse-weight.tsv
• Human readability: name contains information about content
Go from general to specific
• Bad name: rep1-5-7-2016-gene-expression.csv
• Good name: 2016-05-07-gene-expression-rep1.csv
• Machine readability: can be sorted meaningfully
Avoid abbreviations
• Bad name: “sprlbgp1”
• Good name: “spencer_lab_group_1”
• Human readability: no one understands your acronyms
Avoid spaces
• Alternatives
• Dashes-are-cool.txt
• I_also_like_underscores.txt
• CamelCaseIsNeatToo.txt
• Machine readability: spaces are delimiters in programming
• Human readability: delineates words
Avoid special characters
• Bad characters: ~ ! @ # $ % ^ & * ( ) ` ; < > ? , [ ] { } ' "
• Machine readability: can have special meanings in scripting
languages
• Example: ~ tells unix to go to your home directory
• Alternatives: underscore (_) dash( - ) dot (.)
Be consistent
• Establishing standards makes data more findable
• Extending standards to everyone who works on a project is
even better
Renaming files
• Ways to Automate file renaming
• Bulk Rename Utility (Windows, free)
• Renamer 5 (Mac)
• PSRenamer (Linux, Mac, or Windows, free)
Exercise: Rename Lou’s files
• Use descriptive names
• General to specific
• Avoid abbreviations, spaces and special characters
• Be consistent
Tidy data
How to organize your data efficiently in spreadsheets
Spreadsheets as lab notebook
• Color coding
• Formatting
• Notes
• Calculations
• Graphs/Tables
Downsides
• Computers don’t understand
notes/formatting/color coding
• Calculations/Graphs/Tables in
spreadsheets are inefficient
• “Tidy data” + automation =
saved time
Using spreadsheets wisely
• Don’t put multiple tables in one sheet
• Don’t use multiple sheets
• Use descriptive field names
• Don’t mix notes and data
Tidy Data
1. Columns as variables
• Don’t combine multiple
pieces of info in one column
2. Rows as observations
• One measured value
Exercise: Tidy Lou’s data
• Open MouseInventory.xls
• Is he using spreadsheets wisely?
• Is each column a variable?
• Is each row an observation?
• Open the January files for both weight and cytokines
• What variables are being measured? –ie, what columns should we
have?
• Can we combine some of these tables?
Exercise: Data carpentry ecology
• Lesson: http://www.datacarpentry.org/spreadsheet-ecology-
lesson/
• File: https://ndownloader.figshare.com/files/2252083
• Goal: combine data from first 2 tabs into one table
• Make a new tab, don’t edit the raw data!
Example: Supplemental_data_1_xls
• https://figshare.com/articles/Supplemental_data_1_xls/4055544
• Description: “Table of the results given by HPLC analysis of
the samples. Key: Rt, retention time; +, presence of peak; -,
absence of peak.”
Example: cck8_xls
• https://figshare.com/articles/cck8_xls/3505772
• Description: “This data are from CCK-8 assay and ELISA.”
Need help?
• Email: tobin.magle@colostate.edu
• Data Management Services website:
http://lib.colostate.edu/services/data-management
• Data Carpentry: http://www.datacarpentry.org/
• Software Carpentry: http://software-carpentry.org/

More Related Content

What's hot

Data and Donuts: The Impact of Data Management
Data and Donuts: The Impact of Data ManagementData and Donuts: The Impact of Data Management
Data and Donuts: The Impact of Data ManagementC. Tobin Magle
 
Analyzing Extended and Scientific Metadata for Scalable Index Designs
Analyzing Extended and Scientific Metadata for Scalable Index DesignsAnalyzing Extended and Scientific Metadata for Scalable Index Designs
Analyzing Extended and Scientific Metadata for Scalable Index DesignsAleatha Parker-Wood
 
Data wrangling with dplyr
Data wrangling with dplyrData wrangling with dplyr
Data wrangling with dplyrC. Tobin Magle
 
Basic data analysis using R.
Basic data analysis using R.Basic data analysis using R.
Basic data analysis using R.C. Tobin Magle
 
Publishing and Consuming FAIR Data A Case in the Agri-Food Domain
Publishing and Consuming FAIR DataA Case in the Agri-Food DomainPublishing and Consuming FAIR DataA Case in the Agri-Food Domain
Publishing and Consuming FAIR Data A Case in the Agri-Food DomainRothamsted Research, UK
 
Coding and Cookies: R basics
Coding and Cookies: R basicsCoding and Cookies: R basics
Coding and Cookies: R basicsC. Tobin Magle
 
pro-iBiosphere 2013-05 Linked Open Data (Gregor Hagedorn)
pro-iBiosphere 2013-05 Linked Open Data (Gregor Hagedorn)pro-iBiosphere 2013-05 Linked Open Data (Gregor Hagedorn)
pro-iBiosphere 2013-05 Linked Open Data (Gregor Hagedorn)Gregor Hagedorn
 
Converting Metadata to Linked Data
Converting Metadata to Linked DataConverting Metadata to Linked Data
Converting Metadata to Linked DataKaren Estlund
 
Open Harvester - Search publications for a researcher from CrossRef, PubMed a...
Open Harvester - Search publications for a researcher from CrossRef, PubMed a...Open Harvester - Search publications for a researcher from CrossRef, PubMed a...
Open Harvester - Search publications for a researcher from CrossRef, PubMed a...Muhammad Javed
 
Data Management Services at the Morgan Library
Data Management Services at the Morgan LibraryData Management Services at the Morgan Library
Data Management Services at the Morgan LibraryC. Tobin Magle
 
Expediting MRSH-v2 Approximate Matching with Hierarchical Bloom Filter Trees
Expediting MRSH-v2 Approximate Matching with Hierarchical Bloom Filter TreesExpediting MRSH-v2 Approximate Matching with Hierarchical Bloom Filter Trees
Expediting MRSH-v2 Approximate Matching with Hierarchical Bloom Filter TreesDavid Lillis
 
Research Data Sharing: A Basic Framework
Research Data Sharing: A Basic FrameworkResearch Data Sharing: A Basic Framework
Research Data Sharing: A Basic FrameworkPaul Groth
 
A basic course on Research data management, part 4: caring for your data, or ...
A basic course on Research data management, part 4: caring for your data, or ...A basic course on Research data management, part 4: caring for your data, or ...
A basic course on Research data management, part 4: caring for your data, or ...Leon Osinski
 
Sources of Change in Modern Knowledge Organization Systems
Sources of Change in Modern Knowledge Organization SystemsSources of Change in Modern Knowledge Organization Systems
Sources of Change in Modern Knowledge Organization SystemsPaul Groth
 
RDAP 15: Beyond Metadata: Leveraging the “README” to support disciplinary Doc...
RDAP 15: Beyond Metadata: Leveraging the “README” to support disciplinary Doc...RDAP 15: Beyond Metadata: Leveraging the “README” to support disciplinary Doc...
RDAP 15: Beyond Metadata: Leveraging the “README” to support disciplinary Doc...ASIS&T
 
Data management basics, for UC Davis EDU 292
Data management basics, for UC Davis EDU 292Data management basics, for UC Davis EDU 292
Data management basics, for UC Davis EDU 292Phoebe Ayers
 
A basic course on Research data management, part 1: what and why
A basic course on Research data management, part 1: what and whyA basic course on Research data management, part 1: what and why
A basic course on Research data management, part 1: what and whyLeon Osinski
 

What's hot (20)

Reproducible research
Reproducible researchReproducible research
Reproducible research
 
Data and Donuts: The Impact of Data Management
Data and Donuts: The Impact of Data ManagementData and Donuts: The Impact of Data Management
Data and Donuts: The Impact of Data Management
 
Analyzing Extended and Scientific Metadata for Scalable Index Designs
Analyzing Extended and Scientific Metadata for Scalable Index DesignsAnalyzing Extended and Scientific Metadata for Scalable Index Designs
Analyzing Extended and Scientific Metadata for Scalable Index Designs
 
Data wrangling with dplyr
Data wrangling with dplyrData wrangling with dplyr
Data wrangling with dplyr
 
Basic data analysis using R.
Basic data analysis using R.Basic data analysis using R.
Basic data analysis using R.
 
Publishing and Consuming FAIR Data A Case in the Agri-Food Domain
Publishing and Consuming FAIR DataA Case in the Agri-Food DomainPublishing and Consuming FAIR DataA Case in the Agri-Food Domain
Publishing and Consuming FAIR Data A Case in the Agri-Food Domain
 
Coding and Cookies: R basics
Coding and Cookies: R basicsCoding and Cookies: R basics
Coding and Cookies: R basics
 
pro-iBiosphere 2013-05 Linked Open Data (Gregor Hagedorn)
pro-iBiosphere 2013-05 Linked Open Data (Gregor Hagedorn)pro-iBiosphere 2013-05 Linked Open Data (Gregor Hagedorn)
pro-iBiosphere 2013-05 Linked Open Data (Gregor Hagedorn)
 
Converting Metadata to Linked Data
Converting Metadata to Linked DataConverting Metadata to Linked Data
Converting Metadata to Linked Data
 
Open Harvester - Search publications for a researcher from CrossRef, PubMed a...
Open Harvester - Search publications for a researcher from CrossRef, PubMed a...Open Harvester - Search publications for a researcher from CrossRef, PubMed a...
Open Harvester - Search publications for a researcher from CrossRef, PubMed a...
 
Data Management Services at the Morgan Library
Data Management Services at the Morgan LibraryData Management Services at the Morgan Library
Data Management Services at the Morgan Library
 
Expediting MRSH-v2 Approximate Matching with Hierarchical Bloom Filter Trees
Expediting MRSH-v2 Approximate Matching with Hierarchical Bloom Filter TreesExpediting MRSH-v2 Approximate Matching with Hierarchical Bloom Filter Trees
Expediting MRSH-v2 Approximate Matching with Hierarchical Bloom Filter Trees
 
Research Data Sharing: A Basic Framework
Research Data Sharing: A Basic FrameworkResearch Data Sharing: A Basic Framework
Research Data Sharing: A Basic Framework
 
Introduction to open-data
Introduction to open-dataIntroduction to open-data
Introduction to open-data
 
A basic course on Research data management, part 4: caring for your data, or ...
A basic course on Research data management, part 4: caring for your data, or ...A basic course on Research data management, part 4: caring for your data, or ...
A basic course on Research data management, part 4: caring for your data, or ...
 
Sources of Change in Modern Knowledge Organization Systems
Sources of Change in Modern Knowledge Organization SystemsSources of Change in Modern Knowledge Organization Systems
Sources of Change in Modern Knowledge Organization Systems
 
RDAP 15: Beyond Metadata: Leveraging the “README” to support disciplinary Doc...
RDAP 15: Beyond Metadata: Leveraging the “README” to support disciplinary Doc...RDAP 15: Beyond Metadata: Leveraging the “README” to support disciplinary Doc...
RDAP 15: Beyond Metadata: Leveraging the “README” to support disciplinary Doc...
 
Arakno
AraknoArakno
Arakno
 
Data management basics, for UC Davis EDU 292
Data management basics, for UC Davis EDU 292Data management basics, for UC Davis EDU 292
Data management basics, for UC Davis EDU 292
 
A basic course on Research data management, part 1: what and why
A basic course on Research data management, part 1: what and whyA basic course on Research data management, part 1: what and why
A basic course on Research data management, part 1: what and why
 

Viewers also liked

Data and donuts: Data Visualization using R
Data and donuts: Data Visualization using RData and donuts: Data Visualization using R
Data and donuts: Data Visualization using RC. Tobin Magle
 
Gdz istoriya serednih_vikiv
Gdz istoriya serednih_vikivGdz istoriya serednih_vikiv
Gdz istoriya serednih_vikivLucky Alex
 
Scholarly social media applications platforms for knowledge sharing and net...
Scholarly social media applications   platforms for knowledge sharing and net...Scholarly social media applications   platforms for knowledge sharing and net...
Scholarly social media applications platforms for knowledge sharing and net...tullemich
 
Collaborative Data Management using OSF
Collaborative Data Management using OSFCollaborative Data Management using OSF
Collaborative Data Management using OSFC. Tobin Magle
 
Policy Briefs: a development research communication tool
Policy Briefs:a development research communication toolPolicy Briefs:a development research communication tool
Policy Briefs: a development research communication toolguestcadff0c
 
Marketing mix of an NGO (shanessa)
Marketing mix of an NGO (shanessa)Marketing mix of an NGO (shanessa)
Marketing mix of an NGO (shanessa)Heemanish Midde
 
Social Media for Research Communication
Social Media for Research CommunicationSocial Media for Research Communication
Social Media for Research CommunicationAnand Sheombar
 
Communication Research Methods
Communication Research MethodsCommunication Research Methods
Communication Research MethodsJenny Donley
 
Social Media Marketing Nonprofits and NGO
Social Media Marketing Nonprofits and NGOSocial Media Marketing Nonprofits and NGO
Social Media Marketing Nonprofits and NGOIBM Danmark
 
Building your brand – A practical guide for nonprofit organizations
Building your brand – A practical guide for nonprofit organizationsBuilding your brand – A practical guide for nonprofit organizations
Building your brand – A practical guide for nonprofit organizations4Good.org
 

Viewers also liked (16)

Data and donuts: Data Visualization using R
Data and donuts: Data Visualization using RData and donuts: Data Visualization using R
Data and donuts: Data Visualization using R
 
Who invented donuts?
Who invented donuts?Who invented donuts?
Who invented donuts?
 
Gdz istoriya serednih_vikiv
Gdz istoriya serednih_vikivGdz istoriya serednih_vikiv
Gdz istoriya serednih_vikiv
 
Scholarly social media applications platforms for knowledge sharing and net...
Scholarly social media applications   platforms for knowledge sharing and net...Scholarly social media applications   platforms for knowledge sharing and net...
Scholarly social media applications platforms for knowledge sharing and net...
 
Collaborative Data Management using OSF
Collaborative Data Management using OSFCollaborative Data Management using OSF
Collaborative Data Management using OSF
 
Policy Briefs: a development research communication tool
Policy Briefs:a development research communication toolPolicy Briefs:a development research communication tool
Policy Briefs: a development research communication tool
 
Facebook data analysis using r
Facebook data analysis using rFacebook data analysis using r
Facebook data analysis using r
 
New science communication: Research and Innovation in the Era of the Internet
New science communication: Research and Innovation in the Era of the InternetNew science communication: Research and Innovation in the Era of the Internet
New science communication: Research and Innovation in the Era of the Internet
 
Marketing mix of an NGO (shanessa)
Marketing mix of an NGO (shanessa)Marketing mix of an NGO (shanessa)
Marketing mix of an NGO (shanessa)
 
Social Media for Research Communication
Social Media for Research CommunicationSocial Media for Research Communication
Social Media for Research Communication
 
Communication tools for research communication
Communication tools for research communicationCommunication tools for research communication
Communication tools for research communication
 
Communication Research Methods
Communication Research MethodsCommunication Research Methods
Communication Research Methods
 
Fundamental of Communication Research
Fundamental of Communication ResearchFundamental of Communication Research
Fundamental of Communication Research
 
Social Media Marketing Nonprofits and NGO
Social Media Marketing Nonprofits and NGOSocial Media Marketing Nonprofits and NGO
Social Media Marketing Nonprofits and NGO
 
Beyond the usual: Integrating strategic communication into research
Beyond the usual: Integrating strategic communication into researchBeyond the usual: Integrating strategic communication into research
Beyond the usual: Integrating strategic communication into research
 
Building your brand – A practical guide for nonprofit organizations
Building your brand – A practical guide for nonprofit organizationsBuilding your brand – A practical guide for nonprofit organizations
Building your brand – A practical guide for nonprofit organizations
 

Similar to Data and Donuts: Data organization

Lab Notebooks: A Librarian's Primer
Lab Notebooks: A Librarian's PrimerLab Notebooks: A Librarian's Primer
Lab Notebooks: A Librarian's PrimerKristin Briney
 
Preventing data loss
Preventing data lossPreventing data loss
Preventing data lossIUPUI
 
20170222 ku-librarians勉強会 #211 :海外研修報告:英国大学図書館を北から南へ巡る旅
20170222 ku-librarians勉強会 #211 :海外研修報告:英国大学図書館を北から南へ巡る旅20170222 ku-librarians勉強会 #211 :海外研修報告:英国大学図書館を北から南へ巡る旅
20170222 ku-librarians勉強会 #211 :海外研修報告:英国大学図書館を北から南へ巡る旅kulibrarians
 
Data Management for Undergraduate Researchers
Data Management for Undergraduate ResearchersData Management for Undergraduate Researchers
Data Management for Undergraduate ResearchersRebekah Cummings
 
Reviewing the literature
Reviewing the literatureReviewing the literature
Reviewing the literatureZeeshan Ahmad
 
Lab Notebooks as Data Management (SLA Winter Virtual Conference 2012)
Lab Notebooks as Data Management (SLA Winter Virtual Conference 2012)Lab Notebooks as Data Management (SLA Winter Virtual Conference 2012)
Lab Notebooks as Data Management (SLA Winter Virtual Conference 2012)Kristin Briney
 
Powerpoint versiebeheer there is no such thing as a final version 1
Powerpoint versiebeheer there is no such thing as a final version 1Powerpoint versiebeheer there is no such thing as a final version 1
Powerpoint versiebeheer there is no such thing as a final version 1Hugo Besemer
 
Documentation and Metdata - VA DM Bootcamp
Documentation and Metdata - VA DM BootcampDocumentation and Metdata - VA DM Bootcamp
Documentation and Metdata - VA DM BootcampSherry Lake
 
Intro to dh data management
Intro to dh data management Intro to dh data management
Intro to dh data management Rachel Di Cresce
 
File_Organization_112014
File_Organization_112014File_Organization_112014
File_Organization_112014eshuppy
 
Data Management for Undergraduate Research
Data Management for Undergraduate ResearchData Management for Undergraduate Research
Data Management for Undergraduate ResearchRebekah Cummings
 
Medlink revision course in a box
Medlink revision course in a boxMedlink revision course in a box
Medlink revision course in a boxJames Craven
 
Data management (newest version)
Data management (newest version)Data management (newest version)
Data management (newest version)Graça Gabriel
 
Support Your Data, Kyoto University
Support Your Data, Kyoto UniversitySupport Your Data, Kyoto University
Support Your Data, Kyoto UniversityStephanie Simms
 
Scientific Writing in Agriculture Handbook
Scientific Writing in Agriculture Handbook Scientific Writing in Agriculture Handbook
Scientific Writing in Agriculture Handbook UC Davis
 
Scientific Writing in Agriculture 2015
Scientific Writing in Agriculture 2015Scientific Writing in Agriculture 2015
Scientific Writing in Agriculture 2015UC Davis
 
The Data Analysis Workflow
The Data Analysis WorkflowThe Data Analysis Workflow
The Data Analysis WorkflowJonathanEarley3
 
Introduction to digital curation
Introduction to digital curationIntroduction to digital curation
Introduction to digital curationGarethKnight
 

Similar to Data and Donuts: Data organization (20)

Lab Notebooks: A Librarian's Primer
Lab Notebooks: A Librarian's PrimerLab Notebooks: A Librarian's Primer
Lab Notebooks: A Librarian's Primer
 
Preventing data loss
Preventing data lossPreventing data loss
Preventing data loss
 
20170222 ku-librarians勉強会 #211 :海外研修報告:英国大学図書館を北から南へ巡る旅
20170222 ku-librarians勉強会 #211 :海外研修報告:英国大学図書館を北から南へ巡る旅20170222 ku-librarians勉強会 #211 :海外研修報告:英国大学図書館を北から南へ巡る旅
20170222 ku-librarians勉強会 #211 :海外研修報告:英国大学図書館を北から南へ巡る旅
 
Data Management for Undergraduate Researchers
Data Management for Undergraduate ResearchersData Management for Undergraduate Researchers
Data Management for Undergraduate Researchers
 
A Guide for Reproducible Research
A Guide for Reproducible ResearchA Guide for Reproducible Research
A Guide for Reproducible Research
 
Reviewing the literature
Reviewing the literatureReviewing the literature
Reviewing the literature
 
Lab Notebooks as Data Management (SLA Winter Virtual Conference 2012)
Lab Notebooks as Data Management (SLA Winter Virtual Conference 2012)Lab Notebooks as Data Management (SLA Winter Virtual Conference 2012)
Lab Notebooks as Data Management (SLA Winter Virtual Conference 2012)
 
Powerpoint versiebeheer there is no such thing as a final version 1
Powerpoint versiebeheer there is no such thing as a final version 1Powerpoint versiebeheer there is no such thing as a final version 1
Powerpoint versiebeheer there is no such thing as a final version 1
 
Documentation and Metdata - VA DM Bootcamp
Documentation and Metdata - VA DM BootcampDocumentation and Metdata - VA DM Bootcamp
Documentation and Metdata - VA DM Bootcamp
 
Intro to dh data management
Intro to dh data management Intro to dh data management
Intro to dh data management
 
File_Organization_112014
File_Organization_112014File_Organization_112014
File_Organization_112014
 
The big six!
The big six!The big six!
The big six!
 
Data Management for Undergraduate Research
Data Management for Undergraduate ResearchData Management for Undergraduate Research
Data Management for Undergraduate Research
 
Medlink revision course in a box
Medlink revision course in a boxMedlink revision course in a box
Medlink revision course in a box
 
Data management (newest version)
Data management (newest version)Data management (newest version)
Data management (newest version)
 
Support Your Data, Kyoto University
Support Your Data, Kyoto UniversitySupport Your Data, Kyoto University
Support Your Data, Kyoto University
 
Scientific Writing in Agriculture Handbook
Scientific Writing in Agriculture Handbook Scientific Writing in Agriculture Handbook
Scientific Writing in Agriculture Handbook
 
Scientific Writing in Agriculture 2015
Scientific Writing in Agriculture 2015Scientific Writing in Agriculture 2015
Scientific Writing in Agriculture 2015
 
The Data Analysis Workflow
The Data Analysis WorkflowThe Data Analysis Workflow
The Data Analysis Workflow
 
Introduction to digital curation
Introduction to digital curationIntroduction to digital curation
Introduction to digital curation
 

More from C. Tobin Magle

Bringing bioinformatics into the library
Bringing bioinformatics into the libraryBringing bioinformatics into the library
Bringing bioinformatics into the libraryC. Tobin Magle
 
Reproducible research: practice
Reproducible research: practiceReproducible research: practice
Reproducible research: practiceC. Tobin Magle
 
Reproducible research: theory
Reproducible research: theoryReproducible research: theory
Reproducible research: theoryC. Tobin Magle
 
CU Anschutz Health Science Library Data Services
CU Anschutz Health Science Library Data ServicesCU Anschutz Health Science Library Data Services
CU Anschutz Health Science Library Data ServicesC. Tobin Magle
 
Magle data curation in libraries
Magle data curation in librariesMagle data curation in libraries
Magle data curation in librariesC. Tobin Magle
 

More from C. Tobin Magle (6)

Open access day
Open access dayOpen access day
Open access day
 
Bringing bioinformatics into the library
Bringing bioinformatics into the libraryBringing bioinformatics into the library
Bringing bioinformatics into the library
 
Reproducible research: practice
Reproducible research: practiceReproducible research: practice
Reproducible research: practice
 
Reproducible research: theory
Reproducible research: theoryReproducible research: theory
Reproducible research: theory
 
CU Anschutz Health Science Library Data Services
CU Anschutz Health Science Library Data ServicesCU Anschutz Health Science Library Data Services
CU Anschutz Health Science Library Data Services
 
Magle data curation in libraries
Magle data curation in librariesMagle data curation in libraries
Magle data curation in libraries
 

Recently uploaded

Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 

Recently uploaded (20)

Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 

Data and Donuts: Data organization

  • 1. Data Organization C. Tobin Magle, PhD Feb. 28, 2017 10:00-11:30 a.m. Morgan Library Computer Classroom 175 *inspired by content from Data Carpentry
  • 3. Main topics • Hierarchical organization • Folders in folders • Open Science Framework • File naming • Human readability • Machine readability • “Tidy” data in spreadsheets
  • 4. Folder systems • Organize your data hierarchically • Identify ways to divide your data into categories (Attributes) • Top level organization is the most important attribute
  • 5. Hierarchical Organization Putting your files into a folder system my_project Data Notes protocols manuscripts Paper1 Figures Text References Paper2
  • 6. Questions to ask • What kinds of files are there? (See data inventory) • How could you group them? • Project? • Time? • Location? • File type? • What are the most important attributes?
  • 7. Example: Lou the first year Lou is a first year graduate student working on a project in a biomedical research laboratory. He’s trying to decipher data left by a former post doc as a start for his thesis project. For one year, the postdoc recorded weight daily and cytokine levels monthly from 16 mice. Half were infected with a parasite, half were treated with saline. • List the attributes of his project? • How would you rank these attributes?
  • 8. Example: Lou the first year Lou is a first year graduate student working on a project in a biomedical research laboratory. He’s trying to decipher data left by a former post doc as a start for his thesis project. For one year, the postdoc recorded weight daily and cytokine levels monthly from 16 mice. Half were infected with a parasite, half were treated with saline. • List the attributes of his project? • How would you rank these attributes? Attributes • Time
  • 9. Example: Lou the first year Lou is a first year graduate student working on a project in a biomedical research laboratory. He’s trying to decipher data left by a former post doc as a start for his thesis project. For one year, the postdoc recorded weight daily and cytokine levels monthly from 16 mice. Half were infected with a parasite, half were treated with saline. • List the attributes of his project? • How would you rank these attributes? Attributes • Time • Infection Status
  • 10. Example: Lou the first year Lou is a first year graduate student working on a project in a biomedical research laboratory. He’s trying to decipher data left by a former post doc as a start for his thesis project. For one year, the postdoc recorded weight daily and cytokine levels monthly from 16 mice. Half were infected with a parasite, half were treated with saline. • List the attributes of his project? • How would you rank these attributes? Attributes • Time • Infection Status • Data Type
  • 11. Exercise: Organize files • Download Lou’s files (look in the README file for insight) • http://tinyurl.com/hvna4mg • Create a hierarchical folder structure for Lou • Drag his files into the correct folders • Fix Lou’s README • Bonus: think about how you’d organize your data.
  • 12. Tool: Open Science Framework • Components • Add-ons • Contributors • Wiki http://help.osf.io/m/collaborating/l/524109-using-the-wiki http://www.slideshare.net/DuraSpace/121014 -slides-roadmap-to-the-future-of-share
  • 13. Organization tips • Be consistent • One directory per project • Separate components for • Raw data • Processed data • Code • Output • Make raw data read-only • Make README files http://help.osf.io/m/60347/l/611391-organizing-files
  • 14. Components • “Subprojects” • Separate privacy settings, contributors, wiki, add-ons, and files. • Examples: • Different projects: https://osf.io/82fba/ • Clinical: https://osf.io/gq4mz/ • Manuscript: https://osf.io/if7ug/ • Collaboration: https://osf.io/ezcuj/
  • 15. Demo: Getting started with OSF 1. Create a project 2. Add components 3. Add files
  • 16. Don’t panic! • Just try something • There’s no right answer • Be consistent • Write a README.txt file http://4vector.com/i/free-vector-don-t-panic-clip- art_103946_Dont_Panic_clip_art_hight.png
  • 17. File naming conventions Make file name both human and machine readable.
  • 18. Use descriptive names • Bad name: file.txt • Ok name: 05-07-2016-mouse-data.txt • Good name: 2016-05-07-mouse-weight.tsv • Human readability: name contains information about content
  • 19. Go from general to specific • Bad name: rep1-5-7-2016-gene-expression.csv • Good name: 2016-05-07-gene-expression-rep1.csv • Machine readability: can be sorted meaningfully
  • 20. Avoid abbreviations • Bad name: “sprlbgp1” • Good name: “spencer_lab_group_1” • Human readability: no one understands your acronyms
  • 21. Avoid spaces • Alternatives • Dashes-are-cool.txt • I_also_like_underscores.txt • CamelCaseIsNeatToo.txt • Machine readability: spaces are delimiters in programming • Human readability: delineates words
  • 22. Avoid special characters • Bad characters: ~ ! @ # $ % ^ & * ( ) ` ; < > ? , [ ] { } ' " • Machine readability: can have special meanings in scripting languages • Example: ~ tells unix to go to your home directory • Alternatives: underscore (_) dash( - ) dot (.)
  • 23. Be consistent • Establishing standards makes data more findable • Extending standards to everyone who works on a project is even better
  • 24. Renaming files • Ways to Automate file renaming • Bulk Rename Utility (Windows, free) • Renamer 5 (Mac) • PSRenamer (Linux, Mac, or Windows, free)
  • 25. Exercise: Rename Lou’s files • Use descriptive names • General to specific • Avoid abbreviations, spaces and special characters • Be consistent
  • 26. Tidy data How to organize your data efficiently in spreadsheets
  • 27. Spreadsheets as lab notebook • Color coding • Formatting • Notes • Calculations • Graphs/Tables
  • 28. Downsides • Computers don’t understand notes/formatting/color coding • Calculations/Graphs/Tables in spreadsheets are inefficient • “Tidy data” + automation = saved time
  • 29. Using spreadsheets wisely • Don’t put multiple tables in one sheet • Don’t use multiple sheets • Use descriptive field names • Don’t mix notes and data
  • 30. Tidy Data 1. Columns as variables • Don’t combine multiple pieces of info in one column 2. Rows as observations • One measured value
  • 31. Exercise: Tidy Lou’s data • Open MouseInventory.xls • Is he using spreadsheets wisely? • Is each column a variable? • Is each row an observation? • Open the January files for both weight and cytokines • What variables are being measured? –ie, what columns should we have? • Can we combine some of these tables?
  • 32. Exercise: Data carpentry ecology • Lesson: http://www.datacarpentry.org/spreadsheet-ecology- lesson/ • File: https://ndownloader.figshare.com/files/2252083 • Goal: combine data from first 2 tabs into one table • Make a new tab, don’t edit the raw data!
  • 33. Example: Supplemental_data_1_xls • https://figshare.com/articles/Supplemental_data_1_xls/4055544 • Description: “Table of the results given by HPLC analysis of the samples. Key: Rt, retention time; +, presence of peak; -, absence of peak.”
  • 34. Example: cck8_xls • https://figshare.com/articles/cck8_xls/3505772 • Description: “This data are from CCK-8 assay and ELISA.”
  • 35. Need help? • Email: tobin.magle@colostate.edu • Data Management Services website: http://lib.colostate.edu/services/data-management • Data Carpentry: http://www.datacarpentry.org/ • Software Carpentry: http://software-carpentry.org/