Leveling Up Data Management

Kristin Briney
Kristin BrineyData Services Librarian

This talk reviews tips and tools for leveling up your data management skills. Areas covered include: storage, file naming conventions, version control, documentation, and data clean up.

1
2
3
I’m a former chemistry researcher who was really bad at the data management game
the first time I played it.
Now I’m a data services librarian who has produced a book, a blog, and videos in this
area.
I want to make the data management game easy and understandable to all players.
This presentation will not only show you tools but also provide tips on leveling up
during the game.
4
5
6
Beware flash drives as a storage option.
7
Cloud storage is a great option for the 3-2-1 Rule’s offsite copy.
Not all cloud storage is made equal (read Google Drive’s terms of service). And don’t
rely only on cloud storage for your data (several horror stories here).
Many cloud storage providers offer free storage up to a certain amount, and then it’s a
paid plan.
I like SpiderOak. This is primarily a cloud backup solution, which is less good for file
sharing (other options are available for that).
It’s billed as “zero knowledge” cloud storage. Files get encrypted on your computer
before sending to their servers, meaning the company can’t read your files and they
stay secure when travelling across the internet (this is really important).
I combine this with my local computer and an external hard drive to make my 3 copies.
8
9
10
I don’t use Bulk Rename Utility often, but it’s so useful when I do.
Bulk Rename Utility is free for personal users on Windows.
It allows you to rename a large number of files at the same time (such as when you
have a file naming convention you want to apply to existing files).
The interface looks complicated but that is because it is so powerful.
You can: replace particular characters, add or remove things at a particular position,
easily add numbering or dates, swap parts of the file name around, etc.
It takes a few minutes to learn, but it’s a great tool to have in your back pocket.
11
12
Regular expressions (regex) are an amazing tool for search and replace.
Regex doesn’t stand alone, but rather plugs into other tools like Bulk Rename Utility,
notepad++, Java, etc.
Regex works by pattern matching, allowing you to search for all social security numbers
in a document, reformat any phone numbers, change the order of sections in a
document but keep the text the same, etc.
Regex takes a bit more learning but is incredibly useful for anyone doing text
manipulation or clean up.
The first link on this slide is to a tutorial I like.
The second link is to a tool, RegExr, that allows you to test your written regular
expressions against text.
13
14
15
Versioning files by hand takes up a lot of hard drive space.
A version control system, like Git, only saves the differences between one version and
the next instead of the whole file. It also streamlines the versioning process.
Such tools came out of computer science but are being used by many researchers.
Git is free and open source.
Git is different than GitHub – Git basically handles the version control, while GitHub
hosts the files and versions and can make them available to others.
Git is really useful but has a learning curve. Because of that, I recommend starting with
the GUI version unless you are comfortable with the command line.
16
17
This tool originated in computer code
Don’t need anything more complicated than a text editor to make one! I use
notepad++.
18
19
20
21
Excel is a useful tool but isn’t always the best tool for cleaning data.
It’s especially bad with dates and tends to mangle them.
22
OpenRefine is a free, open source tool that was previously known as GoogleRefine.
It is the best tool for cleaning up tabular data.
OpenRefine can break data down by “facet” (variable values or ranges), allowing you to
do quick parsing, counting, or editing.
Editing includes straight replacement, math, basic text manipulation (uppercase to
lowercase, etc.), or other functions using Google Refine Expression Language (GREL).
You can also break multi-component cells apart or combine them into one.
The tool also allows for text clean up, providing a number of different algorithms for
text matching.
23
24
25

Recommended

Beautiful Research Data (Structured Data and Open Refine) by
Beautiful Research Data (Structured Data and Open Refine)Beautiful Research Data (Structured Data and Open Refine)
Beautiful Research Data (Structured Data and Open Refine)Digital Scholarship Unit at the UTSC Library
875 views54 slides
Practical Data Management - ACRL DCIG Webinar by
Practical Data Management - ACRL DCIG WebinarPractical Data Management - ACRL DCIG Webinar
Practical Data Management - ACRL DCIG WebinarKristin Briney
2.6K views56 slides
Creating a Data Management Plan by
Creating a Data Management PlanCreating a Data Management Plan
Creating a Data Management PlanKristin Briney
3.5K views52 slides
SessionThree_IntroductionToVersionControlSystems by
SessionThree_IntroductionToVersionControlSystemsSessionThree_IntroductionToVersionControlSystems
SessionThree_IntroductionToVersionControlSystemsHellen Gakuruh
130 views33 slides
Collaborative Data Projects by
Collaborative Data ProjectsCollaborative Data Projects
Collaborative Data Projectsdatacommons
532 views12 slides
Toolboxes for data scientists by
Toolboxes for data scientistsToolboxes for data scientists
Toolboxes for data scientistsSudipto Krishna Dutta
182 views36 slides

More Related Content

Similar to Leveling Up Data Management

Google software engineering practices by handerson by
Google software engineering practices by handersonGoogle software engineering practices by handerson
Google software engineering practices by handersonmustafa sarac
133 views20 slides
Introduction to go lang by
Introduction to go langIntroduction to go lang
Introduction to go langAmal Mohan N
565 views40 slides
Descriptive Analysis On How To Link Two Enterprises... by
Descriptive Analysis On How To Link Two Enterprises...Descriptive Analysis On How To Link Two Enterprises...
Descriptive Analysis On How To Link Two Enterprises...Rochelle Schear
2 views85 slides
Must be similar to screenshotsI must be able to run the projects.docx by
Must be similar to screenshotsI must be able to run the projects.docxMust be similar to screenshotsI must be able to run the projects.docx
Must be similar to screenshotsI must be able to run the projects.docxherthaweston
1 view11 slides
Digital Work Tools for the rest of us (2015) by
Digital Work Tools for the rest of us (2015)Digital Work Tools for the rest of us (2015)
Digital Work Tools for the rest of us (2015)Filip Modderie
1K views16 slides
Seminar Report on Google File System by
Seminar Report on Google File SystemSeminar Report on Google File System
Seminar Report on Google File SystemVishal Polley
1.5K views28 slides

Similar to Leveling Up Data Management(20)

Google software engineering practices by handerson by mustafa sarac
Google software engineering practices by handersonGoogle software engineering practices by handerson
Google software engineering practices by handerson
mustafa sarac133 views
Introduction to go lang by Amal Mohan N
Introduction to go langIntroduction to go lang
Introduction to go lang
Amal Mohan N565 views
Descriptive Analysis On How To Link Two Enterprises... by Rochelle Schear
Descriptive Analysis On How To Link Two Enterprises...Descriptive Analysis On How To Link Two Enterprises...
Descriptive Analysis On How To Link Two Enterprises...
Rochelle Schear2 views
Must be similar to screenshotsI must be able to run the projects.docx by herthaweston
Must be similar to screenshotsI must be able to run the projects.docxMust be similar to screenshotsI must be able to run the projects.docx
Must be similar to screenshotsI must be able to run the projects.docx
herthaweston1 view
Digital Work Tools for the rest of us (2015) by Filip Modderie
Digital Work Tools for the rest of us (2015)Digital Work Tools for the rest of us (2015)
Digital Work Tools for the rest of us (2015)
Filip Modderie1K views
Seminar Report on Google File System by Vishal Polley
Seminar Report on Google File SystemSeminar Report on Google File System
Seminar Report on Google File System
Vishal Polley1.5K views
Windows registry troubleshooting (2015) by James Konol
Windows registry troubleshooting (2015)Windows registry troubleshooting (2015)
Windows registry troubleshooting (2015)
James Konol535 views
Will Google Docs Spreadsheet Replace Excel? by lenorajohnson
Will Google Docs Spreadsheet Replace Excel?Will Google Docs Spreadsheet Replace Excel?
Will Google Docs Spreadsheet Replace Excel?
lenorajohnson427 views
Exercises portfolio-Digital Curation Tools (IS40620) by softwaresatish
Exercises portfolio-Digital Curation Tools (IS40620)Exercises portfolio-Digital Curation Tools (IS40620)
Exercises portfolio-Digital Curation Tools (IS40620)
softwaresatish207 views
Mke15 by lambojo
Mke15Mke15
Mke15
lambojo292 views
Introduction to Operating Systems by Suhreed Sarkar
Introduction to Operating SystemsIntroduction to Operating Systems
Introduction to Operating Systems
Suhreed Sarkar1.1K views
Cs121 Unit Test by Jill Bell
Cs121 Unit TestCs121 Unit Test
Cs121 Unit Test
Jill Bell3 views
Software for paper formatting by salonibansal21
Software for paper formatting Software for paper formatting
Software for paper formatting
salonibansal21562 views
SAD14 - The Nuts and Bolts by Michael Heron
SAD14 - The Nuts and BoltsSAD14 - The Nuts and Bolts
SAD14 - The Nuts and Bolts
Michael Heron260 views
Advantages of golang development services & 10 most used go frameworks by Katy Slemon
Advantages of golang development services & 10 most used go frameworksAdvantages of golang development services & 10 most used go frameworks
Advantages of golang development services & 10 most used go frameworks
Katy Slemon111 views
Hierarchical And Directory Based Database Essay by Nibadita Palmer
Hierarchical And Directory Based Database EssayHierarchical And Directory Based Database Essay
Hierarchical And Directory Based Database Essay
Nibadita Palmer3 views

More from Kristin Briney

NCURA Webinar on Open Data by
NCURA Webinar on Open DataNCURA Webinar on Open Data
NCURA Webinar on Open DataKristin Briney
216 views48 slides
Internet Privacy by
Internet PrivacyInternet Privacy
Internet PrivacyKristin Briney
589 views27 slides
Breaking the Data Management Barrier by
Breaking the Data Management BarrierBreaking the Data Management Barrier
Breaking the Data Management BarrierKristin Briney
449 views19 slides
Twitter For Academics by
Twitter For AcademicsTwitter For Academics
Twitter For AcademicsKristin Briney
437 views49 slides
TEDxUWMilwaukee: Rethinking Research Data by
TEDxUWMilwaukee: Rethinking Research DataTEDxUWMilwaukee: Rethinking Research Data
TEDxUWMilwaukee: Rethinking Research DataKristin Briney
807 views27 slides
Data Management 101 (2015) by
Data Management 101 (2015)Data Management 101 (2015)
Data Management 101 (2015)Kristin Briney
981 views56 slides

More from Kristin Briney(20)

Breaking the Data Management Barrier by Kristin Briney
Breaking the Data Management BarrierBreaking the Data Management Barrier
Breaking the Data Management Barrier
Kristin Briney449 views
TEDxUWMilwaukee: Rethinking Research Data by Kristin Briney
TEDxUWMilwaukee: Rethinking Research DataTEDxUWMilwaukee: Rethinking Research Data
TEDxUWMilwaukee: Rethinking Research Data
Kristin Briney807 views
NIH Data Policy or: How I Learned to Stop Worrying and Love the Data Manageme... by Kristin Briney
NIH Data Policy or: How I Learned to Stop Worrying and Love the Data Manageme...NIH Data Policy or: How I Learned to Stop Worrying and Love the Data Manageme...
NIH Data Policy or: How I Learned to Stop Worrying and Love the Data Manageme...
Kristin Briney708 views
Retaining Your Old Research Data by Kristin Briney
Retaining Your Old Research DataRetaining Your Old Research Data
Retaining Your Old Research Data
Kristin Briney569 views
Organizing Your Research Data by Kristin Briney
Organizing Your Research DataOrganizing Your Research Data
Organizing Your Research Data
Kristin Briney799 views
Documenting Your Research Data by Kristin Briney
Documenting Your Research DataDocumenting Your Research Data
Documenting Your Research Data
Kristin Briney870 views
Research Data & Digital Preservation - CUWL Conference 2014 by Kristin Briney
Research Data & Digital Preservation - CUWL Conference 2014Research Data & Digital Preservation - CUWL Conference 2014
Research Data & Digital Preservation - CUWL Conference 2014
Kristin Briney682 views
Electronic Laboratory Notebooks by Kristin Briney
Electronic Laboratory NotebooksElectronic Laboratory Notebooks
Electronic Laboratory Notebooks
Kristin Briney4.7K views
Data Management Crash Course by Kristin Briney
Data Management Crash CourseData Management Crash Course
Data Management Crash Course
Kristin Briney1.8K views
Responsible Conduct of Research: Data Management by Kristin Briney
Responsible Conduct of Research: Data ManagementResponsible Conduct of Research: Data Management
Responsible Conduct of Research: Data Management
Kristin Briney774 views
Data Management Tips Handout by Kristin Briney
Data Management Tips HandoutData Management Tips Handout
Data Management Tips Handout
Kristin Briney550 views
Data Management Plan Checklist by Kristin Briney
Data Management Plan ChecklistData Management Plan Checklist
Data Management Plan Checklist
Kristin Briney863 views

Recently uploaded

Dance KS5 Breakdown by
Dance KS5 BreakdownDance KS5 Breakdown
Dance KS5 BreakdownWestHatch
99 views2 slides
Psychology KS5 by
Psychology KS5Psychology KS5
Psychology KS5WestHatch
119 views5 slides
Gopal Chakraborty Memorial Quiz 2.0 Prelims.pptx by
Gopal Chakraborty Memorial Quiz 2.0 Prelims.pptxGopal Chakraborty Memorial Quiz 2.0 Prelims.pptx
Gopal Chakraborty Memorial Quiz 2.0 Prelims.pptxDebapriya Chakraborty
695 views81 slides
Ch. 7 Political Participation and Elections.pptx by
Ch. 7 Political Participation and Elections.pptxCh. 7 Political Participation and Elections.pptx
Ch. 7 Political Participation and Elections.pptxRommel Regala
111 views11 slides
Jibachha publishing Textbook.docx by
Jibachha publishing Textbook.docxJibachha publishing Textbook.docx
Jibachha publishing Textbook.docxDrJibachhaSahVetphys
51 views14 slides
Gross Anatomy of the Liver by
Gross Anatomy of the LiverGross Anatomy of the Liver
Gross Anatomy of the Liverobaje godwin sunday
61 views12 slides

Recently uploaded(20)

Dance KS5 Breakdown by WestHatch
Dance KS5 BreakdownDance KS5 Breakdown
Dance KS5 Breakdown
WestHatch99 views
Psychology KS5 by WestHatch
Psychology KS5Psychology KS5
Psychology KS5
WestHatch119 views
Ch. 7 Political Participation and Elections.pptx by Rommel Regala
Ch. 7 Political Participation and Elections.pptxCh. 7 Political Participation and Elections.pptx
Ch. 7 Political Participation and Elections.pptx
Rommel Regala111 views
Relationship of psychology with other subjects. by palswagata2003
Relationship of psychology with other subjects.Relationship of psychology with other subjects.
Relationship of psychology with other subjects.
palswagata200352 views
Education and Diversity.pptx by DrHafizKosar
Education and Diversity.pptxEducation and Diversity.pptx
Education and Diversity.pptx
DrHafizKosar193 views
Create a Structure in VBNet.pptx by Breach_P
Create a Structure in VBNet.pptxCreate a Structure in VBNet.pptx
Create a Structure in VBNet.pptx
Breach_P78 views
Class 9 lesson plans by TARIQ KHAN
Class 9 lesson plansClass 9 lesson plans
Class 9 lesson plans
TARIQ KHAN51 views
The basics - information, data, technology and systems.pdf by JonathanCovena1
The basics - information, data, technology and systems.pdfThe basics - information, data, technology and systems.pdf
The basics - information, data, technology and systems.pdf
JonathanCovena1146 views
Pharmaceutical Inorganic Chemistry Unit IVMiscellaneous compounds Expectorant... by Ms. Pooja Bhandare
Pharmaceutical Inorganic Chemistry Unit IVMiscellaneous compounds Expectorant...Pharmaceutical Inorganic Chemistry Unit IVMiscellaneous compounds Expectorant...
Pharmaceutical Inorganic Chemistry Unit IVMiscellaneous compounds Expectorant...
Ms. Pooja Bhandare133 views
Classification of crude drugs.pptx by GayatriPatra14
Classification of crude drugs.pptxClassification of crude drugs.pptx
Classification of crude drugs.pptx
GayatriPatra14101 views
CUNY IT Picciano.pptx by apicciano
CUNY IT Picciano.pptxCUNY IT Picciano.pptx
CUNY IT Picciano.pptx
apicciano54 views
Psychology KS4 by WestHatch
Psychology KS4Psychology KS4
Psychology KS4
WestHatch98 views

Leveling Up Data Management

  • 1. 1
  • 2. 2
  • 3. 3
  • 4. I’m a former chemistry researcher who was really bad at the data management game the first time I played it. Now I’m a data services librarian who has produced a book, a blog, and videos in this area. I want to make the data management game easy and understandable to all players. This presentation will not only show you tools but also provide tips on leveling up during the game. 4
  • 5. 5
  • 6. 6
  • 7. Beware flash drives as a storage option. 7
  • 8. Cloud storage is a great option for the 3-2-1 Rule’s offsite copy. Not all cloud storage is made equal (read Google Drive’s terms of service). And don’t rely only on cloud storage for your data (several horror stories here). Many cloud storage providers offer free storage up to a certain amount, and then it’s a paid plan. I like SpiderOak. This is primarily a cloud backup solution, which is less good for file sharing (other options are available for that). It’s billed as “zero knowledge” cloud storage. Files get encrypted on your computer before sending to their servers, meaning the company can’t read your files and they stay secure when travelling across the internet (this is really important). I combine this with my local computer and an external hard drive to make my 3 copies. 8
  • 9. 9
  • 10. 10
  • 11. I don’t use Bulk Rename Utility often, but it’s so useful when I do. Bulk Rename Utility is free for personal users on Windows. It allows you to rename a large number of files at the same time (such as when you have a file naming convention you want to apply to existing files). The interface looks complicated but that is because it is so powerful. You can: replace particular characters, add or remove things at a particular position, easily add numbering or dates, swap parts of the file name around, etc. It takes a few minutes to learn, but it’s a great tool to have in your back pocket. 11
  • 12. 12
  • 13. Regular expressions (regex) are an amazing tool for search and replace. Regex doesn’t stand alone, but rather plugs into other tools like Bulk Rename Utility, notepad++, Java, etc. Regex works by pattern matching, allowing you to search for all social security numbers in a document, reformat any phone numbers, change the order of sections in a document but keep the text the same, etc. Regex takes a bit more learning but is incredibly useful for anyone doing text manipulation or clean up. The first link on this slide is to a tutorial I like. The second link is to a tool, RegExr, that allows you to test your written regular expressions against text. 13
  • 14. 14
  • 15. 15
  • 16. Versioning files by hand takes up a lot of hard drive space. A version control system, like Git, only saves the differences between one version and the next instead of the whole file. It also streamlines the versioning process. Such tools came out of computer science but are being used by many researchers. Git is free and open source. Git is different than GitHub – Git basically handles the version control, while GitHub hosts the files and versions and can make them available to others. Git is really useful but has a learning curve. Because of that, I recommend starting with the GUI version unless you are comfortable with the command line. 16
  • 17. 17
  • 18. This tool originated in computer code Don’t need anything more complicated than a text editor to make one! I use notepad++. 18
  • 19. 19
  • 20. 20
  • 21. 21
  • 22. Excel is a useful tool but isn’t always the best tool for cleaning data. It’s especially bad with dates and tends to mangle them. 22
  • 23. OpenRefine is a free, open source tool that was previously known as GoogleRefine. It is the best tool for cleaning up tabular data. OpenRefine can break data down by “facet” (variable values or ranges), allowing you to do quick parsing, counting, or editing. Editing includes straight replacement, math, basic text manipulation (uppercase to lowercase, etc.), or other functions using Google Refine Expression Language (GREL). You can also break multi-component cells apart or combine them into one. The tool also allows for text clean up, providing a number of different algorithms for text matching. 23
  • 24. 24
  • 25. 25