Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Documentation and Metdata - VA DM Bootcamp

2,215 views

Published on

Documentation and Metdata session for Va Data Management bootcamp January 2015

Published in: Education
  • Login to see the comments

Documentation and Metdata - VA DM Bootcamp

  1. 1. Documentation and Metadata Sherry Lake Data Life Cycle Re-Purpose Re-Use Deposit Data Collection Data Analysis Data Sharing Proposal Planning Writing Data Discovery End of Project Data Archive Project Start Up Andrea Denton
  2. 2. We’ll Explore • Why is documenting your research important? • What do you document (files? datasets? projects? Hands-on • What are the common types of documentation? • Metadata: What is it? Why is it important? Hands-on • Q & A
  3. 3. You’re already documenting your data • Notebook – Paper – Digital – Lab • Folders with notes, text files • Sources, experiments or surveys, procedures, etc.
  4. 4. Critical roles of data documentation • Data Use – To know enough details about how the how the data were collected and stored • Data Discovery – To be able to identify important data sets • Data Retrieval – To know how and where to access data • Data Archiving – Data can grow more valuable with time, but only if the critical information required to retrieve and interpret the data remains available
  5. 5. Information EntropyInformationContentofDataandMetadata Time of data development Specific details about problems with individual items or specific dates are lost relatively rapidly General details about datasets are lost through time Accident or technology change may make data unusable Retirement or career change makes access to “mental storage” difficult or unlikely Loss of investigator leads to loss of remaining information TIME From Michener et al 1997 http://dx.doi.org/10.1890/1051-0761(1997)007[0330:NMFTES]2.0.CO;2)
  6. 6. Elements of Documentation Good data documentation answers these basic questions: • Why were the data created? • What is the data about? • What is the content of the data? The structure? • Who created the data? • Who maintains it?
  7. 7. Elements of Documentation, continued • How were the data created? • How were the data produced/analyzed? • Where was it collected (geographic location)? • When were the data collected? When were they published? • How should the data be cited?
  8. 8. Documentation throughout your research Variable or Item Level File or Dataset Level Project or Study Level • Labels, codes, classifications • Missing values (and how they are represented) • Inventory of data files • Relationship between those files • Records, cases, etc. • What the study set out to do; research questions • How it contributes new knowledge to the field • Methodologies used, instruments and measures UK Data Service: http://ukdataservice.ac.uk/media/440277/documentingdata.pdf/
  9. 9. Exercise 1: Exploring Documentation • Refer to the files on the Data Management Bootcamp site, either – http://guides.lib.odu.edu/VADMBC/materials • In the section Documentation and Metadata Exercise_1_Data_Documentation Worksheet – Or, you may have a handout “Exercise 1”
  10. 10. Exercise 1: Exploring Documentation • For Column 1, take 2-3 minutes and, for each row, write down what general concept (who, what, when, where, how, or why, or a combination of these) that field describes about data, if applicable. • Now take 2-3 minutes to complete Column 2. Considering your research data, what information would you provide for each field? • Don’t have research data? Use the file DailyWeather to fill in Column 2.
  11. 11. Exercise 1 continued • Take 2 minutes • There is a blank row under each category for any information specific to your field, e.g. latitude and longitude, species, etc. • Please share an example with the class in the Google doc “Questions: Ask them here”
  12. 12. Wrapping up: elements of documentation • We’ve looked at commonly used fields • What does your discipline say about what you should document? • The answers you’ve provided could be used to create a data dictionary – we’ll examine next
  13. 13. Types of Documentation • ReadMe File • Data Dictionary • Codebook
  14. 14. ReadMe • Describes the core documentation about an investigation and its data files • Typically a simple text file • Can describe the individual file(s) and/or data package as a whole
  15. 15. ReadMe Example - File
  16. 16. ReadMe Example - File
  17. 17. ReadMe Example - Dataset
  18. 18. Data Dictionary • Provides definitions of the data fields in a data file • More details on the variables, observations of a file
  19. 19. Data Dictionary • Used to understand the data and the databases that contain it • Identifies data elements and their attributes including names, definitions and units of measure and other information • Often they are organized as a table http://www.pnamp.org/sites/default/files/best_practices_for_data_dictionary_definitions_ and_usage_version_1.1_2006-11-14.pdf
  20. 20. Data Dictionary Example: the dataset http://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/GetPdf.cgi?document_name=HowToSubmit.pdf
  21. 21. Data Dictionary Example: the dictionary
  22. 22. Exercise 2: Data Dictionary • Refer to the files on the Data Management Bootcamp site, either – http://guides.lib.odu.edu/VADMBC/materials • In the section Documentation and Metadata Exercise_2_DataDictionaryTemplate – Or, you may have a handout “Exercise 2” • Open the file DailyWeather Weather data source: http://www.ncdc.noaa.gov/cdo- web/search?datasetid=GHCND
  23. 23. • Use the Daily Weather dataset – Two worksheets (tabs) • Data • Definitions • Start by answering the questions • Fill out a data dictionary for this dataset Exercise 2: Data Dictionary Creation
  24. 24. Exercise 2 Discussion
  25. 25. What is a Codebook? • Typical in social sciences research • Includes elements similar to readme and dictionary – Project level information (e.g. survey design and methodology) – Response codes for each variable – Codes used to indicate nonresponse and missing data http://www.icpsr.umich.edu/icpsrweb/ICPSR/support/faqs/2006/01/what-is- codebook
  26. 26. What is a Codebook? • Additionally, codebooks may also contain: – A copy of the survey questionnaire (if applicable) – Exact questions and skip patterns used in a survey – Frequencies of response • Quite long! http://www.icpsr.umich.edu/icpsrweb/ICPSR/s upport/faqs/2006/01/what-is-codebook
  27. 27. Codebook Example http://www.icpsr.umich.edu/icpsrweb/ICPSR/help/cb9721.jsp
  28. 28. Codebook Example http://dataarchives.ss.ucla.edu/archive%20tutorial/aboutcodebooks.html
  29. 29. Other Examples of Data Documentation • Lab notebooks • Software syntax • Programming code • Instrument settings and/or calibration • Provenance of sources of data • Embedded metadata (e.g. EXIF, FITS)
  30. 30. Metadata • What is it? – Information that describes a resource – NISO: “metadata is structured information that describes, explains, locates, or otherwise makes it easier to retrieve, use, or manage an information resource” • Why is it important? – Enables a resource or data to be easily discovered – Good metadata will help others understand and use your data
  31. 31. Metadata in Everyday Life DataONE Education Module: Metadata. DataONE. Retrieved Nov 12, 2012. From http://www.dataone.org/sites/all/documents/L07_Metadata.pptx Author(s) Boullosa, Carmen. Title(s) They're cows, we're pigs / by Carmen Boullosa Place New York : Grove Press, 1997. Physical Descr viii, 180 p ; 22 cm. Subject(s) Pirates Caribbean Area Fiction. Format Fiction
  32. 32. Metadata Formats • Documentation for understanding & re-use – Readme File – Data Dictionary – Codebook • Structured documentation in XML format for use in programs (few examples) – DDI – FGDC – EML
  33. 33. Exercise 3: XML File Creation • Refer to the files on the Data Management Bootcamp site, either – http://guides.lib.odu.edu/VADMBC/materials • In the section Documentation and Metadata Exercise_3_Weather-DDI-XML-FillinBlanks – Or, you may have a handout “Exercise 3”
  34. 34. Exercise 3: XML File Creation • Take the file Weather-DDI-XML and fill in the blanks (as best you can) using: • the file DailyWeather • and/or Exercise 2 Data Dictionary
  35. 35. Exercise 3 Discussion
  36. 36. Exercise 3 Discussion
  37. 37. Exercise 3 Discussion
  38. 38. Structured XML A Few Standard Schemes (XML) – DDI– Data Document Initiative http://www.ddialliance.org/ – FGDC– Geospatial Metadata Standard http://www.fgdc.gov/metadata/geospatial-metadata- standards – EML– Ecological Metadata Language http://knb.ecoinformatics.org/software/eml/
  39. 39. FGDC Example
  40. 40. Structured Metadata Tools Tools – Colectica add-on for Excel (DDI) – Nesstar (DDI) – Metavist (FGDC) – ArcGIS (FGDC) * – Morpho (EML) http://data.library.virginia.edu/data-management/plan/metadata/metadata-workshop/
  41. 41. Example 1: Nesstar DDI Tool
  42. 42. Example 2: Metavist FGDC Tool
  43. 43. Metadata Concept Map by Amanda Tarbet is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike 3.0 Unported License. Metadata Standards
  44. 44. Metadata Wrap-up How to chose a metadata standard or documentation format? • What does your discipline use? • Look at what depositing repository requires
  45. 45. Research Life Cycle Data Life Cycle Re- Purpose Re- Use Deposit Data Collection Data Analysis Data Sharing Proposal Planning Writing Data Discovery End of Project Data Archive Project Start Up
  46. 46. QUESTIONS?

×