  • In this segment of the course we will cover:What is metadata?What are examples of metadata in our daily lives? And what information needs to be included in a metadata record?
  • Data collection in the field is recorded in a wide variety of ways, including field notebooks, streaming data from satellites, data created from models, etc
  • After returning from the field, scientists will transfer field notes into spreadsheets and other types of databases in preparation for their data analysis. Displayed here is a partial copy of a data set taken from the website “Frog Watch”. Notice there is no indication of Celsius or Fahrenheit in the “temperature” column. This is a simple example of how it is difficult to understand a dataset without all of the information.
  • Once scientists have collected and analyzed data, they publish their conclusions in appropriate science journals.
  • A dataset is a collection of data. Often datasets are considered spatial or tabular. However, many tabular datasets are inherently spatial – they represent spatial information. There are a variety of elements that can be found in a dataset including values, measures, points, conditions, qualities, frequencies or attributes.
  • If you were to share your data, what type of information would be most useful to understand the data set?Alternatively, whenreceiving data from an external source, what information is needed to understand the data set? Metadata containsinformation about the dataset that allows it to be understood when shared amongst scientists.
  • When sharing data, some considerations include: - why the data was created; - what limitations, if any, the data have;. - what the data means; and who should be cited if someone publishes something that utilized the data.When receiving data from an alternative source, consider: What are the data gaps?What processes were used for creating the current data?Are there any fees associated with the data?In what scale were the data created? What do the values in the tables mean?What software do I need in order to read the data?What projection is the data in?Can I give this data to someone else?Metadata contain information about a data set, in a standardized format, such that it can be understood and re-used.
  • Metadata is data about data. It describes the content, quality, condition, and other characteristics of a dataset. Metadata records answer questions such as: Why was the data set created? What processes were used to create the data set? What projection is the data in? When was the data last updated? Who created the data? What scale was used? What fields are in the table? What do the values in those fields mean? Who do I contact about getting more information about the data? How do I obtain a copy of the data? Do the data cost anything? Are there any limitations to the data?Metadata is a valuable tool. Metadata records preserve the usefulness of data over time by detailing methods for data collection and data set creation. Metadata greatly minimizes duplication of effort in the collection of expensive digital data and fosters the sharing of digital data resources.
  • Metadata is all around us. . .from Mp3 players, to nutrition labels, to library card catalogues.For example, a card catalogue tell us more information than just the title of the book, they also tells the user: Who is the author? Who published the book? What subject area does the book fall in? And finally, where is it located in the library? Another example of metadata that we see in our daily lives is the nutrition and ingredient information on food labels.Nutrition labels answer questions such as: What ingredients were used? Who made the food? How many calories per serving? How many servings in the can? What percentage of daily vitamins are in each serving?
  • An established standard provides common terms,definitions and structure that allow for consistent communication. The use of standards also support search and retrieval in automated systems.
  • This is an example of a metadata record using the Federal Geographic Data Committee (FGDC) standard.
  • Metadata is useful to Data Users, Data Developers, and Organizations. In this era of data sharing, collaboration, and need for information organization, metadata can serve multiple purposes.
  • Even if the value of data documentation is recognized, concerns remain as to the effort required to create metadata that effectively describes the data.
  • Metadata does require time and effort to create. The workload, however, is reduced when metadata creation is incorporated into the data development process and the effort is distributed among data contributors. Metadata creation and management should be treated as a standard data development procedure and resources for staff and time should be included in project and proposal work plans and budgets. The use of a standardized metadata format and the development of discipline specific ‘profiles’ of metadata can enable data users to quickly find needed information and address data developer concerns about metadata use and comprehension.
  • What value does metadata have to Data Developers?Metadata records will help avoid data duplication because researchers can determine if data already exists. Scientists are able to share reliable information about a dataset by creating metadata and passing it along with the dataset. Scientists wishing to reuse a dataset can be confident of its origins, data quality, and other valuable information about the data. Metadata also allow data creators to publicize the valuable data they have collected by making the metadata available on clearinghouses and other publically available venues. Metadata can be used in citation practices, thus increasing the visibility of the data.
  • Metadata allows the user to search for and access data from a variety of sources. A search for metadata can be constricted to a geographic boundary, thus showing the user what data has been collected in a particular region. Metadata records help users determine whether the data will be applicable for use in a particular study. Finally, metadata records are of value to data users, because they determine how a dataset can be acquired, and if there are any restrictions on how the data can be used.
  • An organization that keeps current metadata can benefit in many ways. Metadata records help ensure the organization’s investment in the data by retaining information about how the data was collected, processed, and quality controlled. This creates a permanent record of the dataset –which is critical institutional memory. When researchers leave or retire, metadata allows the dataset to “live on” for the organization. The data may be reused in another research project in the future, and future researchers in the organization will need to know how the dataset was created. Finally, metadata advertises an organization’s research, creating new potential partnerships and collaboration thru data sharing.
  • This graph illustrates the phenomenon of “information entropy”, associated with research. At the time of the research project, a scientists memory is fresh. Details about the development of the dataset are easily recalled, and it is a good time to document information about the process. Over time, memory of the details begins to fade. A variety of circumstances can intervene, and eventually detailed knowledge about the dataset fades. Without a metadata record, this data might be unusable. A dataset it not considered complete without a metadata record to accompany it.
  • Sound data management is best achieved with making metadata creation a part of the workflow. Not only can it keep the individual scientist organized, but the data has a much better chance of being re-used by future scientists.
  • Metadata is very beneficial because it can be used to support data distribution, data management, and project management. To be best utilized, metadata should be considered a component of the data, created during the development of the data, and populated with rich content.
  • Metadata supports data distribution through discovery, publication, and data portals.
  • Metadata serves data discovery at multiple levels:initial identification by query of keywords, location, time, and attributes a quick assessment can be made by the scientist as to how useful the data is for a project by reading the access and use constraints; data quality measures of positional and attribute accuracy and sources used; and statements as to data availability, format and pricing a user can find out how to access the by reading access instructions, any standard order process instructions, and contact information for the dataset.
  • A collection of metadata records can be published to the web in a variety of formats including:A website catalogcreation of a web accessible folder (waf) that is later havested a Z39.50 clearinghouse metadata services such as ESRI ArcIMS Metadata Service a geospatial data portal
  • Geospatial data portals are plentiful, and contain easily accessible metadata collections from a variety of institutions.
  • The USGS Core Science Metadata Clearinghouse is an example of a metadata repository, available to all researchers.
  • Metadata supports data management in a variety of ways because metadata records can be used to assist data maintenance and update, accountability, data liability, and discovery and reuse.
  • For data management, metadata records can be queried to determine:do we have data older than 10 years?do we have data that was before some political or geophysical event resulted in significant change?do we have data that used some older or now invalid data as a source?do we have data that used older or now invalid methods?Global edits to contacts, policies, URLS, and information about new derivations of the data set can be included in metadata records, thus assisting the data management process.
  • It is important to note that metadata is not only useful for those trying to find and reuse good data, it is also very useful to the researcher in managing his/her own data.
  • If you create robust metadata, you can use information contained in the metadata to locate and re-use data. The metadata contains information about themes, geographic location, time ranges, analytical methods, sources, and data quality.
  • Metadata allows you to repeat a scientific process if methodologies, variables, and analytical parameters are well defined. It allows you to defend and demonstrate scientific process. A defensible process enables you to demonstrate the methodologies that led to decisions using the data. Increasingly the savvy public demand metadata with datasets for consumer information purposes.
  • Metadata creation requires that you are accountable for your data and can document everything you know and do not know about your dataset.
  • Metadata is invaluable for data liability. For example, a record will indicate the purpose – why the data was collected, any use constraints that are associated with the data, how complete the dataset is, and who is liable once the data is distributed and reused.
  • Metadata records support project management activities in a variety of ways including project planning, monitoring, coordination, and deliverables.
  • Metadata can serve as a project design document that establishes the project’s intent, extents, suggested source data resources, and database design. By doing so, project expectations are clearly outlined, metadata is immediately integrated into the process and the record serves as a medium to record project progress.
  • If metadata is created during the course of the research, the record can be used to monitor: where the data set is in the data development process any problems associated with data quality that should be addressed prior to further development any problems with proposed methods and/or sources that will require a change in approach to data analysis.
  • Metadata can be a means to improve communications among project participants.A project metadata template can be developed for to establish descriptors, parameters (time, geography, species, etc.), vocabularies, contact info, entity/attributes and distribution information.If actively used by the team, the metadata can be a source for identifying and tracking the use of new source data, analytical methods, and adding in information pertinent to the project used by other project participants.
  • Metadata should be specified as a component of any data deliverable. If a part of a project is contracted out, include metadata as a deliverable in the contract. Be sure to specify the standard required and the level of completion required in the record itself. It is not enough to say ‘compliant metadata’ as you will likely get minimal information in the record. Instead, provide clear guidance and, preferably, a sample record that provides core info (Who should be named as Originator, what liability and use constraints statements to use) and illustrates the level of detail expected.
  • There are many standards available to document data. Each has a different focus, yet ask for similar information about the data set.
  • There are many standards available to document data. Each has a different focus, yet ask for similar information about the data set.
  • More Factors to consider: Your organization’s policies: do they state which standard to use? What resources are available to create metadata? Examples of Tools:FGDC CSDGM: Mermaid (NOAA) (Forest Service) (USGS) EML: Morpho ( ( XML Spy or OxegynCatMDOther factors: Availability of human support; instructional materials; use of controlled vocabularies; output formats
  • Metadata is documentation of dataA metadata record captures critical information about the content of a datasetMetadata allows data to be discovered, accessed, and re-usedA metadata standard provides structure and consistency to data documentationStandards and tools vary – select according to defined criteria such as data type, organizational guidance, and available resourcesMetadata is of critical importance to data developers, data users, and organizationsMetadata can be effectively used for:data distributiondata managementproject managementMetadata completes a dataset.
    Suggested citation: DataONE Education Module: Metadata. DataONE. Retrieved Nov12, 2012. From tx Copyright license information: No rights reserved; you may enhance and reuse for your own purposes. We do ask that you provide appropriate citation and attribution to DataONE.