Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Research Data Management
Spring 2014: Session 2
Practical strategies for better results
University Library
Center for Digi...
Review
Key points from last week?
What is still unclear?
DMP
Data map: complete the partially mapped
research question
OR
Start your own data map
Don’t forget to upload your DMP t...
PROJECT & DATA DOCUMENTATION
MODULE 2
LEARNING
OUTCOMES
• Outline planned
project and data
documentation in a
data management
plan.
• Identify
documentation and...
Why do we document?
What is Metadata
DATADETAILS
Time of data development
Specific details about problems with individual items or
specific da...
Why do we document?
“Scientific publications have at least two goals:
(i) to announce a result and (ii) to convince
reader...
Analysis and Workflows
• Reproducibility at core of scientific method
• Complex process = more difficult to reproduce
• Go...
Why do we document?
• Provide an accurate, reliable record of your work
– Including all the details you will not remember ...
Best Practices
Best Practices for Preparing Ecological Data Sets, ESA, August 2010
The 20-Year Rule
• The metadata accompa...
Data Management Planning
Plan
Collect
Assure
Describe
Preserve
Discover
Integrate
Analyze
What?
• Everything that is crucial for others to
understand, interpret, evaluate, and build on
your work
• What do YOU thi...
Think-Pair-Share
What do you think you need to document
about your project?
Share with your partner/group
Share with the c...
What? How much?
Project-level
• Project history, aims, objectives and hypotheses
• Data collection methods: data collectio...
What? How much?
Data-level
• Names, labels and descriptions for variables, records and their values
• Units of measurement...
What? How much?
• What went right
– So you can repeat/replicate it
• What went wrong
– So you can determine the cause (e.g...
Some Effective Strategies
• Data
– Data models
– Data dictionaries
– Metadata
• Project (see Documentation Instructions fo...
Data Models
Data Dictionary
A description of all study variables; for each variable:
• Variable name
• Role of the variable (analytica...
What is Metadata?
• A structured set of terms describing a defined world
– Standardized
– Structured
• Metadata can be cre...
Why Use/Create Metadata?
• Metadata is critical for communicating context for data
• How is metadata used?
– To find thing...
Everyday Examples
HEB site
Everyday Examples
Creating Metadata
Think-Pair-Share
Transform narrative description to structured
metadata using the provided template.
Write the information...
• http://datadryad.org/resource/doi:10.5061/dryad.ph
8s5
– Readme files
– File-level metadata (e.g., Keywords, Scientific ...
DMP
Sections to draft and/or refine:
• Metadata
–Documentation Checklist
References
1. HEB site. (2014). Reading Nutrition Labels. From
http://www.heb.com/page/recipes-cooking/cooking-
tips/readi...
ORGANIZING DATA & FILES
MODULE 2
LEARNING
OUTCOMES
• Develop a consistent and
coherent file organization
and naming convention
scheme for all project
files...
File Organization & Naming
• Be Clear, Concise, Consistent, Correct, and
Conformant
• Consider what is necessary to find a...
Organization: Filing v. Piling
• Filing (hierarchical)
– When organizing files, directory top-level folder
should include ...
File Names
Courtesy of PhD Comics
Naming Files
• Be Clear, Concise, Consistent, Correct, and
Conformant
• Make it meaningful
• Remember the purpose is to pr...
Elements of a File Name
• Project/grant name and/or number
• Date of creation/modification
• Name of creator/investigator:...
Naming Strategies
• Date first
– 20110103_diss_surveyB_raw
– 20110118_diss_surveyB_raw
– 20110119_diss_inter_trans
– 20110...
Technical Tips
• For sequential numbering, use leading zeros.
– For example, a sequence of 1-10 should be numbered 01-10; ...
Think-Pair-Share
• Develop a file naming scheme for your
project (enter it in your DMP).
• Share it with your partner.
• S...
File Formats
• Choose formats that are more likely to be accessible
in the future (10-20 years)
– Non-proprietary
– Open, ...
Master Files
• Provides snapshots of key phases in the data
life cycle
– Raw
– Cleaned
– Phases of processing
• In combina...
Version Control
• Manual – file names
– Sequential numbered system
– Dated
• Automatic – version control software
– Mercur...
DMP
Sections to work on:
• Format (revise)
–Are you choosing the best formats?
• Data organization (write)
–File & Folder ...
References
1. DataONE Education Module: Data Management Planning. DataONE. From
http://www.dataone.org/sites/all/
document...
Why do we document?
Wrapping up
What’s next?
Mid-point evaluation
Data Management Lab: Session 2 slides
Data Management Lab: Session 2 slides
Upcoming SlideShare
Loading in …5
×

Data Management Lab: Session 2 slides

638 views

Published on

Spring 2014 Data Management Lab: Session 2 Slides (more details at http://ulib.iupui.edu/digitalscholarship/dataservices/datamgmtlab)

What you will learn:
1. Build awareness of research data management issues associated with digital data.
2. Introduce methods to address common data management issues and facilitate data integrity.
3. Introduce institutional resources supporting effective data management methods.
4. Build proficiency in applying these methods.
5. Build strategic skills that enable attendees to solve new data management problems.

Published in: Education, Technology
  • Be the first to comment

  • Be the first to like this

Data Management Lab: Session 2 slides

  1. 1. Research Data Management Spring 2014: Session 2 Practical strategies for better results University Library Center for Digital Scholarship
  2. 2. Review Key points from last week? What is still unclear?
  3. 3. DMP Data map: complete the partially mapped research question OR Start your own data map Don’t forget to upload your DMP to Box. Suggested file name: DMP_20140401
  4. 4. PROJECT & DATA DOCUMENTATION MODULE 2
  5. 5. LEARNING OUTCOMES • Outline planned project and data documentation in a data management plan. • Identify documentation and metadata required to describe data
  6. 6. Why do we document?
  7. 7. What is Metadata DATADETAILS Time of data development Specific details about problems with individual items or specific dates are lost relatively rapidly General details about datasets are lost through time Accident or technology change may make data unusable Retirement or career change makes access to “mental storage” difficult or unlikely Loss of data developer leads to loss of remaining information TIME (From Michener et al 1997)
  8. 8. Why do we document? “Scientific publications have at least two goals: (i) to announce a result and (ii) to convince readers that the result is correct… papers in experimental science should describe the results and provide a clear enough protocol to allow successful repetition and extension” -Mesirov, 2010
  9. 9. Analysis and Workflows • Reproducibility at core of scientific method • Complex process = more difficult to reproduce • Good documentation required for reproducibility o Metadata: data about data o Process metadata: data about process used to create, manipulate, and analyze data CCimagebyRichardCarteronFlickr Provenance: where your data came from and what has been done to it Crucial for replication/ reproducibility
  10. 10. Why do we document? • Provide an accurate, reliable record of your work – Including all the details you will not remember when it’s time to write up the project • Facilitate writing of high quality publications • Necessary for reproducibility, a core principle of scientific process • Establish provenance – Relevant to commercial application and patents (legal), defending your publications (scientific), responsible conduct of research (scientific)
  11. 11. Best Practices Best Practices for Preparing Ecological Data Sets, ESA, August 2010 The 20-Year Rule • The metadata accompanying a data set should be written for a user 20 years into the future--what does that investigator need to know to use the data? • Prepare the data and documentation for a user who is unfamiliar with your project, methods, and observations 11
  12. 12. Data Management Planning Plan Collect Assure Describe Preserve Discover Integrate Analyze
  13. 13. What? • Everything that is crucial for others to understand, interpret, evaluate, and build on your work • What do YOU think? • Metadata should capture the who, what, when, where, how, why of your data
  14. 14. Think-Pair-Share What do you think you need to document about your project? Share with your partner/group Share with the class
  15. 15. What? How much? Project-level • Project history, aims, objectives and hypotheses • Data collection methods: data collection protocol, sampling design, instruments, hardware and software used, data scale and resolution, temporal coverage and geographic coverage • Dataset structure of data files, cases, relationships between files • Data sources used (enough detail to find it again) • Data validation, checking, proofing, cleaning and other quality assurance procedures carried out • Modifications made to data over time since their original creation and identification of different versions of datasets • Information on data confidentiality, access and use conditions
  16. 16. What? How much? Data-level • Names, labels and descriptions for variables, records and their values • Units of measurement • Explanation of codes and classification schemes used • Codes of, and reasons for, missing values • Derived data created after collection, with code, algorithm or command file used to create them • Weighting and grossing variables created • Data listing with descriptions for cases, individuals or items studied • Equipment, instruments, or other data collection tools used • Field, lab, or interview conditions
  17. 17. What? How much? • What went right – So you can repeat/replicate it • What went wrong – So you can determine the cause (e.g., human error, machine error, etc.) and prevent it from happening again
  18. 18. Some Effective Strategies • Data – Data models – Data dictionaries – Metadata • Project (see Documentation Instructions for examples) – Procedures Manual – Protocols – Lab Notebooks – Codebook – Reference Libraries
  19. 19. Data Models
  20. 20. Data Dictionary A description of all study variables; for each variable: • Variable name • Role of the variable (analytical) • Variable label • Unit of measurement (if applicable) • Type of variable • Permissible values or range of values • Definitions of redefined or derived variables • Additional edits to be performed (logic & consistency)
  21. 21. What is Metadata? • A structured set of terms describing a defined world – Standardized – Structured • Metadata can be created automatically or manually • Ex: ClinicalTrials.gov
  22. 22. Why Use/Create Metadata? • Metadata is critical for communicating context for data • How is metadata used? – To find things – To describe things – To merge things • Metadata standards define a common set of terms and structure to communicate information – Enables consistency, shared definitions, shared language, and shared structure for interoperability • Different standards have been developed for different purposes (social science data, clinical trials, ecology)
  23. 23. Everyday Examples HEB site
  24. 24. Everyday Examples
  25. 25. Creating Metadata
  26. 26. Think-Pair-Share Transform narrative description to structured metadata using the provided template. Write the information corresponding to the field on your index card. Abstract at http://doi.org/10.1542/peds.2013- 1488
  27. 27. • http://datadryad.org/resource/doi:10.5061/dryad.ph 8s5 – Readme files – File-level metadata (e.g., Keywords, Scientific Names, Spatial Coverage, Temporal Coverage) • Selected ICPSR dataset 34792 – Codebook – Scope of Study – Citation & Metadata Exports Good Documentation Examples
  28. 28. DMP Sections to draft and/or refine: • Metadata –Documentation Checklist
  29. 29. References 1. HEB site. (2014). Reading Nutrition Labels. From http://www.heb.com/page/recipes-cooking/cooking- tips/reading-nutrition-labels 2. Mesirov JP: Computer science. Accessible reproducible research. Science 2010, 327(5964): 415-416. From http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3878063/?t ool=pubmed 3. Target Clear Rx bottles: http://www.brandchannel.com/features_profile.asp?pr_id= 248
  30. 30. ORGANIZING DATA & FILES MODULE 2
  31. 31. LEARNING OUTCOMES • Develop a consistent and coherent file organization and naming convention scheme for all project files. • Select appropriate non- proprietary hardware and software formats for storing data. • Create protected copies of files at crucial points in your study • Use versioning software or documentation for tracking changes to files over time.
  32. 32. File Organization & Naming • Be Clear, Concise, Consistent, Correct, and Conformant • Consider what is necessary to find and access files in next year and when the project is complete. • Develop a scheme and use it. • Track changes.
  33. 33. Organization: Filing v. Piling • Filing (hierarchical) – When organizing files, directory top-level folder should include the project title, unique identifier, and date (year). – The substructure should have a clear, documented naming convention; for example, each run of an experiment, each version of a dataset, and/or each person in the group. • Piling (tags) – All files in one directory, rely on sorting and searching.
  34. 34. File Names
  35. 35. Courtesy of PhD Comics
  36. 36. Naming Files • Be Clear, Concise, Consistent, Correct, and Conformant • Make it meaningful • Remember the purpose is to provide context
  37. 37. Elements of a File Name • Project/grant name and/or number • Date of creation/modification • Name of creator/investigator: last name first followed by (initials of) first name • Research team/department associated with the data • Content or subject descriptor • Data collection method (instrument, site, etc.) • Version number • Project phase
  38. 38. Naming Strategies • Date first – 20110103_diss_surveyB_raw – 20110118_diss_surveyB_raw – 20110119_diss_inter_trans – 20110204_diss_surveyB_quest-B • Subject first – diss_surveyB_raw_20110103 – diss_surveyB_raw_20110118 – diss_inter_trans_20110119 – diss_surveyB_quest_20110204 • Type first – surveyB_raw_diss_20110103 – surveyB_raw_diss_20110118 – inter_trans_diss_20110119 – surveyB_quest_diss_20110204 • Numbered (Forced ordering) – 01_diss_survey_raw_20110103 – 01_diss_survey_raw_20110118 – 02_diss_inter_trans_20110119 – 04_diss_survey_quest- B_20110204 Whitmire, 2014
  39. 39. Technical Tips • For sequential numbering, use leading zeros. – For example, a sequence of 1-10 should be numbered 01-10; a sequence of 1-100 should be numbered 001-010-100. • No special characters in file names & , * % # ; * ( ) ! @$ ^ ~ ' { } [ ] ? < > - • Use only one period ONLY before the file extension (e.g. name_paper.doc NOT name.paper.doc OR name_paper..doc) ?Will your files still be unique and comprehensible if moved to another location
  40. 40. Think-Pair-Share • Develop a file naming scheme for your project (enter it in your DMP). • Share it with your partner. • Share with class.
  41. 41. File Formats • Choose formats that are more likely to be accessible in the future (10-20 years) – Non-proprietary – Open, documented standard – Commonly used – Standardized (ASCII, Unicode) • Also, if possible – Unencrypted – Uncompressed • Ex: PDF/A (not .doc/x), ASCII (not .xls/x), MPEG-4, TIFF or JPEG2000, XML or RDF (not RDBMS)
  42. 42. Master Files • Provides snapshots of key phases in the data life cycle – Raw – Cleaned – Phases of processing • In combination with detailed documentation, these files make write-up easier and supports reproducibility and reuse • Demonstrate provenance (i.e., an audit trail)
  43. 43. Version Control • Manual – file names – Sequential numbered system – Dated • Automatic – version control software – Mercurial – TortoiseSVN – GitHub • Keep log files, supplement with documentation (e.g., readme.txt, comments, etc.)
  44. 44. DMP Sections to work on: • Format (revise) –Are you choosing the best formats? • Data organization (write) –File & Folder structure –File naming convention –Master files/Data locks
  45. 45. References 1. DataONE Education Module: Data Management Planning. DataONE. From http://www.dataone.org/sites/all/ documents/L03_DataManagementPlanning.pptx 2. DataONE Education Module: Data Citation. DataONE. From http://www.dataone.org/sites/all/documents/L09_DataCitation.pptx 3. McNeill, K. (2013). Research Data Management: File Organization. From: http://libraries.mit.edu/guides/subjects/data- management/File%20Organization_JulyAP2013.pdf 4. MIT. (2014). Organizing your files. From: http://libraries.mit.edu/guides/subjects/data-management/organizing.html 5. Savage, A. (nd). Mythbusters. From http://weknowmemes.com/2012/10/ the-only-difference-between-screwing-around-and-science/ 6. Whitmire, A. (2014). Research Data Management – Organizing Your Data. From http://guides.library.oregonstate.edu/grad521lectures
  46. 46. Why do we document?
  47. 47. Wrapping up What’s next? Mid-point evaluation

×