DataONE Education Module 07: Metadata

776 views
680 views

Published on

Lesson 7 in a set of 10 created by DataONE on Best Practices fo Data Management. The full module can be downloaded from the DataONE.org website at: http://www.dataone.org/educaiton-modules. Released under a CC0 license, attribution and citation requested.

Published in: Education
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
776
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
0
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide
  • In this segment of the course we will cover:What is metadata?What are examples of metadata in our daily lives? And what information needs to be included in a metadata record?
  • Data collection in the field is recorded in a wide variety of ways, including field notebooks, streaming data from satellites, data created from models, etc
  • After returning from the field, scientists will transfer field notes into spreadsheets and other types of databases in preparation for their data analysis. Displayed here is a partial copy of a data set taken from the website “Frog Watch”. Notice there is no indication of Celsius or Fahrenheit in the “temperature” column. This is a simple example of how it is difficult to understand a dataset without all of the information.
  • Once scientists have collected and analyzed data, they publish their conclusions in appropriate science journals.
  • A dataset is a collection of data. Often datasets are considered spatial or tabular. However, many tabular datasets are inherently spatial – they represent spatial information. There are a variety of elements that can be found in a dataset including values, measures, points, conditions, qualities, frequencies or attributes.
  • If you were to share your data, what type of information would be most useful to understand the data set?Alternatively, whenreceiving data from an external source, what information is needed to understand the data set? Metadata containsinformation about the dataset that allows it to be understood when shared amongst scientists.
  • When sharing data, some considerations include: - why the data was created; - what limitations, if any, the data have;. - what the data means; and who should be cited if someone publishes something that utilized the data.When receiving data from an alternative source, consider: What are the data gaps?What processes were used for creating the current data?Are there any fees associated with the data?In what scale were the data created? What do the values in the tables mean?What software do I need in order to read the data?What projection is the data in?Can I give this data to someone else?Metadata contain information about a data set, in a standardized format, such that it can be understood and re-used.
  • Metadata is data about data. It describes the content, quality, condition, and other characteristics of a dataset. Metadata records answer questions such as: Why was the data set created? What processes were used to create the data set? What projection is the data in? When was the data last updated? Who created the data? What scale was used? What fields are in the table? What do the values in those fields mean? Who do I contact about getting more information about the data? How do I obtain a copy of the data? Do the data cost anything? Are there any limitations to the data?Metadata is a valuable tool. Metadata records preserve the usefulness of data over time by detailing methods for data collection and data set creation. Metadata greatly minimizes duplication of effort in the collection of expensive digital data and fosters the sharing of digital data resources.
  • Metadata is all around us. . .from Mp3 players, to nutrition labels, to library card catalogues.For example, a card catalogue tell us more information than just the title of the book, they also tells the user: Who is the author? Who published the book? What subject area does the book fall in? And finally, where is it located in the library? Another example of metadata that we see in our daily lives is the nutrition and ingredient information on food labels.Nutrition labels answer questions such as: What ingredients were used? Who made the food? How many calories per serving? How many servings in the can? What percentage of daily vitamins are in each serving?
  • An established standard provides common terms,definitions and structure that allow for consistent communication. The use of standards also support search and retrieval in automated systems.
  • This is an example of a metadata record using the Federal Geographic Data Committee (FGDC) standard.
  • Metadata is useful to Data Users, Data Developers, and Organizations. In this era of data sharing, collaboration, and need for information organization, metadata can serve multiple purposes.
  • Even if the value of data documentation is recognized, concerns remain as to the effort required to create metadata that effectively describes the data.
  • Metadata does require time and effort to create. The workload, however, is reduced when metadata creation is incorporated into the data development process and the effort is distributed among data contributors. Metadata creation and management should be treated as a standard data development procedure and resources for staff and time should be included in project and proposal work plans and budgets. The use of a standardized metadata format and the development of discipline specific ‘profiles’ of metadata can enable data users to quickly find needed information and address data developer concerns about metadata use and comprehension.
  • What value does metadata have to Data Developers?Metadata records will help avoid data duplication because researchers can determine if data already exists. Scientists are able to share reliable information about a dataset by creating metadata and passing it along with the dataset. Scientists wishing to reuse a dataset can be confident of its origins, data quality, and other valuable information about the data. Metadata also allow data creators to publicize the valuable data they have collected by making the metadata available on clearinghouses and other publically available venues. Metadata can be used in citation practices, thus increasing the visibility of the data.
  • Metadata allows the user to search for and access data from a variety of sources. A search for metadata can be constricted to a geographic boundary, thus showing the user what data has been collected in a particular region. Metadata records help users determine whether the data will be applicable for use in a particular study. Finally, metadata records are of value to data users, because they determine how a dataset can be acquired, and if there are any restrictions on how the data can be used.
  • An organization that keeps current metadata can benefit in many ways. Metadata records help ensure the organization’s investment in the data by retaining information about how the data was collected, processed, and quality controlled. This creates a permanent record of the dataset –which is critical institutional memory. When researchers leave or retire, metadata allows the dataset to “live on” for the organization. The data may be reused in another research project in the future, and future researchers in the organization will need to know how the dataset was created. Finally, metadata advertises an organization’s research, creating new potential partnerships and collaboration thru data sharing.
  • This graph illustrates the phenomenon of “information entropy”, associated with research. At the time of the research project, a scientists memory is fresh. Details about the development of the dataset are easily recalled, and it is a good time to document information about the process. Over time, memory of the details begins to fade. A variety of circumstances can intervene, and eventually detailed knowledge about the dataset fades. Without a metadata record, this data might be unusable. A dataset it not considered complete without a metadata record to accompany it.
  • Sound data management is best achieved with making metadata creation a part of the workflow. Not only can it keep the individual scientist organized, but the data has a much better chance of being re-used by future scientists.
  • Metadata is very beneficial because it can be used to support data distribution, data management, and project management. To be best utilized, metadata should be considered a component of the data, created during the development of the data, and populated with rich content.
  • Metadata supports data distribution through discovery, publication, and data portals.
  • Metadata serves data discovery at multiple levels:initial identification by query of keywords, location, time, and attributes a quick assessment can be made by the scientist as to how useful the data is for a project by reading the access and use constraints; data quality measures of positional and attribute accuracy and sources used; and statements as to data availability, format and pricing a user can find out how to access the by reading access instructions, any standard order process instructions, and contact information for the dataset.
  • A collection of metadata records can be published to the web in a variety of formats including:A website catalogcreation of a web accessible folder (waf) that is later havested a Z39.50 clearinghouse metadata services such as ESRI ArcIMS Metadata Service a geospatial data portal
  • Geospatial data portals are plentiful, and contain easily accessible metadata collections from a variety of institutions.
  • The USGS Core Science Metadata Clearinghouse is an example of a metadata repository, available to all researchers.
  • Metadata supports data management in a variety of ways because metadata records can be used to assist data maintenance and update, accountability, data liability, and discovery and reuse.
  • For data management, metadata records can be queried to determine:do we have data older than 10 years?do we have data that was before some political or geophysical event resulted in significant change?do we have data that used some older or now invalid data as a source?do we have data that used older or now invalid methods?Global edits to contacts, policies, URLS, and information about new derivations of the data set can be included in metadata records, thus assisting the data management process.
  • It is important to note that metadata is not only useful for those trying to find and reuse good data, it is also very useful to the researcher in managing his/her own data.
  • If you create robust metadata, you can use information contained in the metadata to locate and re-use data. The metadata contains information about themes, geographic location, time ranges, analytical methods, sources, and data quality.
  • Metadata allows you to repeat a scientific process if methodologies, variables, and analytical parameters are well defined. It allows you to defend and demonstrate scientific process. A defensible process enables you to demonstrate the methodologies that led to decisions using the data. Increasingly the savvy public demand metadata with datasets for consumer information purposes.
  • Metadata creation requires that you are accountable for your data and can document everything you know and do not know about your dataset.
  • Metadata is invaluable for data liability. For example, a record will indicate the purpose – why the data was collected, any use constraints that are associated with the data, how complete the dataset is, and who is liable once the data is distributed and reused.
  • Metadata records support project management activities in a variety of ways including project planning, monitoring, coordination, and deliverables.
  • Metadata can serve as a project design document that establishes the project’s intent, extents, suggested source data resources, and database design. By doing so, project expectations are clearly outlined, metadata is immediately integrated into the process and the record serves as a medium to record project progress.
  • If metadata is created during the course of the research, the record can be used to monitor: where the data set is in the data development process any problems associated with data quality that should be addressed prior to further development any problems with proposed methods and/or sources that will require a change in approach to data analysis.
  • Metadata can be a means to improve communications among project participants.A project metadata template can be developed for to establish descriptors, parameters (time, geography, species, etc.), vocabularies, contact info, entity/attributes and distribution information.If actively used by the team, the metadata can be a source for identifying and tracking the use of new source data, analytical methods, and adding in information pertinent to the project used by other project participants.
  • Metadata should be specified as a component of any data deliverable. If a part of a project is contracted out, include metadata as a deliverable in the contract. Be sure to specify the standard required and the level of completion required in the record itself. It is not enough to say ‘compliant metadata’ as you will likely get minimal information in the record. Instead, provide clear guidance and, preferably, a sample record that provides core info (Who should be named as Originator, what liability and use constraints statements to use) and illustrates the level of detail expected.
  • There are many standards available to document data. Each has a different focus, yet ask for similar information about the data set.
  • There are many standards available to document data. Each has a different focus, yet ask for similar information about the data set.
  • More Factors to consider: Your organization’s policies: do they state which standard to use? What resources are available to create metadata? Examples of Tools:FGDC CSDGM: Mermaid (NOAA) http://www.ncddc.noaa.gov/metadata-standards/mermaid/Metavist (Forest Service) http://ncrs.fs.fed.us/pubs/viewpub.asp?key=2737TKME (USGS) http://geology.usgs.gov/tools/metadata/tools/doc/tkme.html EML: Morpho (http://knb.ecoinformatics.org/morphoportal.jsp)ISO: (http://www.fgdc.gov/metadata/iso-metadata-editor-review) XML Spy or OxegynCatMDOther factors: Availability of human support; instructional materials; use of controlled vocabularies; output formats
  • Metadata is documentation of dataA metadata record captures critical information about the content of a datasetMetadata allows data to be discovered, accessed, and re-usedA metadata standard provides structure and consistency to data documentationStandards and tools vary – select according to defined criteria such as data type, organizational guidance, and available resourcesMetadata is of critical importance to data developers, data users, and organizationsMetadata can be effectively used for:data distributiondata managementproject managementMetadata completes a dataset.
  • DataONE Education Module 07: Metadata

    1. 1. Lesson 7: Metadata CC image by bonus on FlickrWhat is Metadata
    2. 2. • Definition of metadata • Examine information included in a metadata record • Examples of metadata standards and how to choose • Illustrate the value of metadata to data users, data providers, and organizations • Describe the utility of metadata for a variety of scenarios CC image by Alec Couros on FlickrWhat is Metadata
    3. 3. After completing this lesson, the participant will be able to: • Define science metadata • Give examples of metadata that you are likely to encounter in the ‘real world’ (i.e., outside of a research context) • Identify and list the types of information typically included in metadata records for environmental datasets • Identify 3 reasons metadata is of value to data users, data developers, and organizations • List 3 uses for metadata, beyond discovery of data • Identify and describe factors that may determine which metadata standards are most appropriate for a given datasetWhat is Metadata
    4. 4. Plan Analyze Collect Integrate Assure Discover Describe PreserveWhat is Metadata
    5. 5. What is Metadata CC image by acordova on Flickr CC image by CIMMYT on Flickr CC image by Justin See on Flickr CC image by kukkurovaca on Flickr CC image by ISAS on Flickr CC image by SEDAC on Flickr
    6. 6. Average Temperature of Observation for Each Species Species Average Temperature Number of Minimum Maximum Temperature Standard Observations Temperature Temperature Deviation Northern 4.4 --- 1 4.4 4.4 Red-legged Frog Tailed Frog 7.0 3.0 3 4 10 Arizona Toad 10.0 --- 1 10 10 Streckers Chorus 10.5 2.0 11 9 16 Frog Oregon Spotted 11.0 15.5 2 0 22 Frog New Jersey Chorus 11.5 4.5 17 3 22 Frog Wood Frog 12.5 5.5 897 0 28.8 Spring Peeper 13.2 5.6 569 -1 32 Red-legged Frog 13.3 5.9 16 4 27What is Metadata
    7. 7. What is Metadata CC image by Heather Kennedy on Flickr
    8. 8. • Definition: A collection of data • Generally datasets can be defined as: o Spatial – a collection of logically related features arranged in a prescribed manner such as GIS map layers, water features, etc o Tabular – a file, spreadsheet, data in a table o Many tabular datasets are inherently “spatial”, e.g. water-quality samples associated with stream collection sites • Elements in a dataset can include: o Values, measures, points, coordinates, conditions, qualities, frequencies, or attributes that are a result of an observational studyWhat is Metadata
    9. 9. • When you provide data to someone else, what types of information would you want to include with the data? • When you receive a dataset from an external source, what types of details do you want to know about the data?What is Metadata
    10. 10. • Providing data: o Why were the data created? o What limitations, if any, do the data have? o What does the data mean? o How should the data be cited if it is re-used in a new study? • Receiving data: o What are the data gaps? o What processes were used for creating the data? o Are there any fees associated with the data? o In what scale were the data created? o What do the values in the tables mean? o What software do I need in order to read the data? o What projection are the data in? o Can I give these data to someone else?What is Metadata
    11. 11. Metadata is: Data ‘reporting’ • WHO created the data? • WHAT is the content of the data? Photo by Michelle Chang. All Rights Reserved • WHEN were the data created? • WHERE is it geographically? • HOW were the data developed? • WHY were the data developed?What is Metadata
    12. 12. • Metadata is all around… CC image by Mskadu on Flickr CC image by USDAgov on Flickr Author(s) Boullosa, Carmen. Title(s) Theyre cows, were pigs / by Carmen Boullosa Place New York : Grove Press, 1997. Physical Descr viii, 180 p ; 22 cm. Subject(s) Pirates Caribbean Area Fiction. Format FictionWhat is Metadata
    13. 13. • A Standard provides a structure to describe data with: o Common terms to allow consistency between records o Common definitions for easier interpretation o Common language for ease of communication o Common structure to quickly locate information • In search and retrieval, standards provide: o Documentation structure in a reliable and predictable format for computer interpretation o A uniform summary description of the dataset CC image by ccarlstead on FlickrWhat is Metadata
    14. 14. What is Metadata CC image by I like on Flickr
    15. 15. Data users Metadata helps… OrganizationsWhat is Metadata
    16. 16. Even if the value of data documentation is recognized, concerns remain as to the effort required to create metadata that effectively describe the data. CC image by waterlilysage on FlickrWhat is Metadata
    17. 17. Concern Solution incorporate metadata creation workload required to capture into data development process – accurate robust metadata distribute the effort time and resources to create, include in grant budget and manage, and maintain metadata schedule use a standardized metadata readability / usability of metadata format ‘profile’ standard to require discipline specific information and specific information and use ontologies specific valuesWhat is Metadata
    18. 18. • Metadata allows data developers to: o Avoid data duplication o Share reliable information o Publicize efforts – promote the work of a scientist and his/her contributions to a field of study CC image by US Embassy Guyana on FlickrWhat is Metadata
    19. 19. • Metadata gives a user the ability to: o Search, retrieve, and evaluate data set information from both inside and outside an organization o Find data: Determine what data exists for a CC image by ASEE on Flickr geographic location and/or topic o Determine applicability: Decide if a data set meets a particular need o Discover how to acquire the dataset you identified; process and use the datasetWhat is Metadata
    20. 20. • Metadata helps ensure an organization’s investment in data: o Documentation of data processing steps, quality control, definitions, data uses, and restrictions o Ability to use data after initial intended purpose • Transcends people and time: o Offers data permanence CC image by mambol on Flickr o Creates institutional memory • Advertises an organization’s research: o Creates possible new partnerships and collaborations through data sharingWhat is Metadata
    21. 21. Time of data development Specific details about problems with individual items or specific dates are lost relatively rapidly DATA DETAILS General details about datasets are lost through time Retirement or career change makes access to “mental storage” difficult or unlikely Accident or Loss of data technology developer leads to change may loss of remaining make data information unusable TIME (From Michener et al 1997)What is Metadata
    22. 22. DATA DETAILS Sound information management, including metadata development, can arrest the loss of dataset detail. TIMEWhat is Metadata
    23. 23. • Metadata can support: collect o data distribution derive classify o data management o project management planimetric imagery meta meta • If it is: o considered a component of the data charette analysis o created during data development o populated with rich content committee alternative review meta PLAN metaWhat is Metadata
    24. 24. What is Metadata
    25. 25. • The descriptive content of the metadata file can be used to identify, assess, and access available data resources. IDENTIFY • keywords • geographic location ASSESS • time period • attributes • use constraints • access constraints ACCESS • data quality • online access • availability/pricing • order process • contactsWhat is Metadata
    26. 26. • A metadata collection can be published to the internet via: o website catalog o web accessible folder (WAF) o Z39.50 metadata clearinghouse o metadata service o geospatial data portal Internet / Internet Intranet User Query Metadata Collection DatasetWhat is Metadata
    27. 27. • Examples of metadata search portals: o Data.gov • Federal e-gov geospatial data portal http://www.geo.data.gov CC image by RGB12 on Flickr o Metacat • Repository for data and metadata • http://knb.ecoinformatics.org/index.jsp o US Geological Survey • USGS Core Science Metadata Clearinghouse: http://mercury.ornl.gov/clearinghouse o ArcGIS Online • ESRI sponsored national geospatial data portal http://www.geographynetwork.comWhat is Metadata
    28. 28. What is Metadata
    29. 29. What is Metadata
    30. 30. • Metadata records can used to track data provenance accuracy • Data Maintenance: o Are the data current? • Do we have data older than ten years? • was before some political or geophysical event that resulted in significant change? o Are the data valid? • prior to most current source data • prior to most current methodologies • Data Update: o Contact information o Distribution policies, availability, pricing, URLs o New derivations of the datasetWhat is Metadata
    31. 31. If you create metadata, other people can discover your data CC image by Oceanit Daily Photo If you create metadata, you can find your own data on FlickrWhat is Metadata
    32. 32. • Find your data by: CC image by NASA Goddard Spece Flight o themes / attributes o geographic location o time ranges o analytical methods used Center on Flickr o sources and contributors o data quality Discoverable data is usable data!What is Metadata
    33. 33. • Metadata allows you to repeat scientific process if: o methodologies are defined INPUT o variables are defined o analytical parameters are defined • Metadata allows you to defend your scientific process: o demonstrate process o increasingly GIS-savvy public requires metadata for consumer information RESULTSWhat is Metadata
    34. 34. • Metadata is an exercise in data accountability. It requires you to assess: o What do you know about the dataset? o What don’t you know about the dataset? o What should you know about the dataset? Are you willing to associate yourself with the metadata record ?What is Metadata
    35. 35. • Metadata is a declaration of: Purpose What to o the originator’s intended application of do… the data What not to Use Constraints do… o inappropriate applications of the data Completeness o features or geographies excluded from the data Distribution Liability o explicit liability of the data producer and assumed liability of the consumerWhat is Metadata
    36. 36. Project CoordinationWhat is Metadata
    37. 37. • Metadata records can serve as a project design document: o descriptions & intent of project o geographic and temporal extent of project o source data of project o attribute requirements of project • Benefits: o expectations are clearly outlined o metadata is integrated into the process o provides a medium to record progressWhat is Metadata
    38. 38. • Use metadata to monitor: o data development status o QA/QC assessments o needed changes in approach milestones Monitoring requires that the metadata be actively maintained and reviewed! timeWhat is Metadata
    39. 39. • Metadata can be a means to improve communications among project participants using common: o descriptions & parameters o keywords, vocabularies, thesauri o contact information o attributes o distribution information • If reviewed regularly by all participants, metadata created early and updated during the project improves opportunity for coordinating: o source data o analytical methods o new informationWhat is Metadata
    40. 40. • As a key component of the data, metadata should be part of any data deliverable • For quality metadata from a deliverable, the record should provide: o Citation information o Data quality information o Accurate geospatial information o Clearly defined entities and attributes o Distribution informationWhat is Metadata
    41. 41. What is Metadata Image courtesy of Viv Hutchinson
    42. 42. • Dublin Core Element Set o Emphasis on web resources, publications o http://dublincore.org/documents/dces/ • FGDC Content Standard for Digital Geospatial Metadata (CSDGM) o Emphasis on geospatial data o Biological Data Profile (BDP) of the CSDGM o Profile to the CSDGM emphasis on biological data (and geospatial) o http://www.fgdc.gov/metadata/geospatial-metadata-standards • ISO 19115/19139 Geographic information: Metadata o Emphasis on geospatial data and services o http://www.fgdc.gov/metadata/geospatial-metadata- standards#fgdcendorsedisostandardsWhat is Metadata
    43. 43. • Ecological Metadata Language (EML) o Focus on ecological data o http://knb.ecoinformatics.org/eml_metadata_guide.html • Darwin Core o Emphasis on museum specimens o http://rs.tdwg.org/dwc/index.htm • Geography Markup Language (GML) o Emphasis on geographic features (roads, highways, bridges) o http://www.opengeospatial.org/standards/gmlWhat is Metadata
    44. 44. EML FGDC Title Title Abstract Abstract Entity Description Entity Type Definition Intellectual Rights Use ConstraintsWhat is Metadata
    45. 45. • Many standards collect similar information • Factors to consider: o Your data type: • Are you working mainly with GIS data? Rastor/vector or point data? Do you have biological or shoreline information in your dataset? - Consider the FGDC Content Standard for Digital Geospatial Metadata with one of its profiles: the Biological Data Profile or the Shoreline Data Profile. • Are you working with data retrieved from instruments such as monitoring stations or satellites? Are you using geospatial data services such as applications for web-mapping applications or data modeling? - If so, then consider using the ISO 19115-2 standard • Are you mainly working with ecological data? - Consider Ecological Metadata Language (EML)What is Metadata
    46. 46. • More Factors to consider: o Your organization’s policies: do they state which standard to use? o What resources are available to create metadata? Examples of Tools: • FGDC CSDGM: - Mermaid (NOAA) http://www.ncddc.noaa.gov/metadata-standards/mermaid/ - Metavist (Forest Service) http://ncrs.fs.fed.us/pubs/viewpub.asp?key=2737 - TKME (USGS) http://geology.usgs.gov/tools/metadata/tools/doc/tkme.html • EML: - Morpho (http://knb.ecoinformatics.org/morphoportal.jsp) • ISO: (http://www.fgdc.gov/metadata/iso-metadata-editor-review) - XML Spy or Oxegyn - CatMD o Other factors: Availability of human support; instructional materials; use of controlled vocabularies; output formatsWhat is Metadata
    47. 47. • Metadata is documentation of data • A metadata record captures critical information about the content of a dataset • Metadata allows data to be discovered, accessed, and re-used • A metadata standard provides structure and consistency to data documentation • Standards and tools vary – select according to defined criteria such as data type, organizational guidance, and available resources • Metadata is of critical importance to data developers, data users, and organizations • Metadata can be effectively used for: o data distribution o data management o project management • Metadata completes a dataset. Creating robust metadata is in your OWN best interest!What is Metadata
    48. 48. The full slide deck may be downloaded from: http://www.dataone.org/education-modules Suggested citation: DataONE Education Module: Metadata. DataONE. Retrieved Nov12, 2012. From http://www.dataone.org/sites/all/documents/L07_Metadata.pp tx Copyright license information: No rights reserved; you may enhance and reuse for your own purposes. We do ask that you provide appropriate citation and attribution to DataONE.What is Metadata

    ×