Your SlideShare is downloading. ×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Can ISO 19157 support current NASA data quality metadata?

267
views

Published on

ISO 19157 provides a powerful framework for describing quality of Earth science datasets. As NASA migrates towards using that standard, it is important to understand whether and how existing data …

ISO 19157 provides a powerful framework for describing quality of Earth science datasets. As NASA migrates towards using that standard, it is important to understand whether and how existing data quality content fits into the ISO 19157 model. This talk demonstrates that fit and concludes that ISO 19157 can include all existing content and also includes new capabilities that can be very useful for all kinds of NASA data users.

Published in: Science, Technology, Business

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
267
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. ISO 19157 – A Unified Metadata Model for Data Quality Ted Habermann Director of Earth Science The HDF Group thabermann@hdfgroup.org 1 The Global Change Master Directory (GCMD) and the EOS Clearinghouse (ECHO) both include data quality information? Can this information be mapped to ISO 19157? GCMD ECHO ISO ?Presented May 21, 2014 at the Earth Science Information Partnership Information Quality Cluster
  • 2. Metadata in Multiple Dialects Documentation Repository ISO 19115-1, 19115-2, 19119 19157 and extensions THREDDS HDF, netCDF (NcML) FGDC, Data.Gov SensorML WCS, WMS, WFS, SOS Open Provenance Model, PROV DIF, ECS, ECHO KML
  • 3. GCMD Data Quality 3 The <Quality> field allows the author to provide information about the quality of the data or any quality assurance procedures followed in producing the data described in the metadata.
  • 4. GCMD Quality Examples 4 <Quality> Due to the lack of high resolution data available over the region for 1993-94, it has been hard to validate the product. However the maps of burnt areas correspond well with active fire maps for the region. Where large [>3km] scars are found, the detection is more reliable. In areas of small scars more problems are involved. It is hoped that the 1994-95 data set will cover the whole of the study area and be calibrated by high resolution data. </Quality> <Quality> QA performed by CDIAC One of the roles of the Carbon Dioxide Information Analysis Center (CDIAC) is quality assurance (QA) of data. The QA process is an important component of the value-added concept of assuring accurate, usable information for researchers, because data received by CDIAC are rarely in condition for immediate distribution, regardless of source. </Quality> <Quality> The fire training-set may also have been biased against savanna and savanna woodland fires since their detection is more difficult than in humid, forst environments with cool background temperatures [Malingreau, 1990]. There may, therefore, be an under-sampling of fires in these warmer background environments. </Quality> <Quality> Note that Data File 12, Report #2, TASK 2 (Auclair et al., 1994a) is a Quality Assurance and Quality Control chapter for the areas of Canada, Alaska, United States (48 states), with range estimates of validation and error, a listing of discussions with experts in the field and a review of the draft of data files. </Quality>
  • 5. ISO Data Quality Results DQ_ConformanceResult + specification : CI_Citation + explanation : CharacterString + pass : Boolean DQ_CoverageResultDQ_DescriptiveResult + statement : CharacterString <<Abstract>> DQ_Result + dateTime [0..1] : DateTime + resultScope [0..1] : DQ_Scope DQ_QuantitativeResult + valueType [0..1] : RecordType + valueUnit : UnitOfMeasure + value [1..*] : Record ISO 19157 includes four kinds of quality results. It includes a new kind of report (DQ_DescriptiveResult) that includes a simple text description of the result of the quality test.
  • 6. ISO Data Quality Results DQ_ConformanceResult + specification : CI_Citation + explanation : CharacterString + pass : Boolean DQ_CoverageResultDQ_DescriptiveResult + statement : CharacterString <<Abstract>> DQ_Result + dateTime [0..1] : DateTime + resultScope [0..1] : DQ_Scope DQ_QuantitativeResult + valueType [0..1] : RecordType + valueUnit : UnitOfMeasure + value [1..*] : Record <Quality> QA performed by CDIAC One of the roles of the Carbon Dioxide Information Analysis Center (CDIAC) is quality assurance (QA) of data. The QA process is an important component of the value-added concept of assuring accurate, usable information for researchers, because data received by CDIAC are rarely in condition for immediate distribution, regardless of source. </Quality> The DQ_DescriptiveResult matches the GCMD quality element exactly.
  • 7. GCMD Quality Examples 7 <Quality> Due to the lack of high resolution data available over the region for 1993-94, it has been hard to validate the product. However the maps of burnt areas cor- respond well with active fire maps for the region. Where large [>3km] scars are found, the detection is more reliable. In areas of small scars more problems are involved. It is hoped that the 1994-95 data set will cover the whole of the study area and be calibrated by high resolution data. </Quality> <Quality> QA performed by CDIAC One of the roles of the Carbon Dioxide Information Analysis Center (CDIAC) is quality assurance (QA) of data. The QA process is an important component of the value-added concept of assuring accurate, usable information for researchers, because data received by CDIAC are rarely in condition for immediate distribution, regardless of source. </Quality> <Quality> The fire training-set may also have been biased against savanna and savanna woodland fires since their detection is more difficult than in humid, forst environments with cool background temperatures [Malingreau, 1990]. There may, therefore, be an under-sampling of fires in these warmer background environments. </Quality> <Quality> Note that Data File 12, Report #2, TASK 2 (Auclair et al., 1994a) is a Quality Assurance and Quality Control chapter for the areas of Canada, Alaska, United States (48 states), with range estimates of validation and error, a listing of discussions with experts in the field and a review of the draft of data files. </Quality>
  • 8. Standalone Quality Reports DQ_Element + standaloneQualityReportDetails [0..1] : CharacterString DQ_DataQuality + scope : DQ_Scope + report 1..* DQ_StandaloneQualityReportInformation + reportReference: CI_Citation + abstract : CharacterString + standaloneQualityReport 0..1 ISO 19157 acknowledges that important quality information can exist in standalone reports that may not fit easily into the ISO conceptual model. These standalone reports are cited with abstracts that summarize the results for the user without neccessitating access to the cited report.
  • 9. ECHO Quality Information ECHO includes two kinds of quality information
  • 10. ECHO Quality Information ECHO includes two kinds of quality information QAStats – Standard measures for all products QAPercentMissingData - Granule level % missing data. This attribute can be repeated for individual parameters within a granule. QAPercentOutOfBoundsData – Granule level % out of bounds data. This attribute can be repeated for individual parameters within a granule. QAPercentInterpolatedData – Granule level % interpolated data. This attribute can be repeated for individual parameters within a granule. QAPercentCloudCover – This attribute is used to characterize the cloud cover amount of a granule. This attribute may be repeated for individual parameters within a granule. (Note - there may be more than one way to define a cloud or it's effects within a product containing several parameters; i.e. this attribute may be parameter specific)
  • 11. ECHO Quality Information ECHO includes two kinds of quality information QAFlags – Classes of quality measures with product specific implementations AutomaticQualityFlag – The granule level flag applying generally to the granule and specifically to parameters the granule level. When applied to parameter, the flag refers to the quality of that parameter for the granule (as applicable). The parameters determining whether the flag is set are defined by the developer and documented in the Quality Flag Explanation. AutomaticQualityFlagExplanation – A text explanation of the criteria used to set automatic quality flag, including thresholds or other criteria. OperationalQualityFlag – The granule level flag applying both generally to a granule and specifically to parameters at the granule level. When applied to parameter, the flag refers to the quality of that parameter for the granule (as applicable). The parameters determining whether the flag is set are defined by the developers and documented in the Operational Quality Flag Explanation. OperationalQualityFlagExplanation – A text explanation of the criteria used to set operational quality flag; including thresholds or other criteria. ScienceQualityFlag – Granule level flag applying to a granule, and specifically to parameters. When applied to parameter, the flag refers to the quality of that parameter for the granule (as applicable). The parameters determining whether the flag is set are defined by the developers and documented in the Science Quality Flag Explanation. ScienceQualityFlagExplanation – A text explanation of the criteria used to set science quality flag; including thresholds or other criteria.
  • 12. ECHO Data Quality Measures DQ_MeasureReference + measureIdentification [0..1] : MD_Identifier + nameOfMeasure [0..*] : CharacterString + measureDescription [0..1] : CharacterString if measureIdentification is not provided, then nameOfMeasure shall be provided + measure 0..1 ISO 19157 data quality measure references identify measures in several ways and provides a brief description of the measure. <<Abstract>> DQ_Element + measure [0..*] : DQ_MeasureReference + evaluationMethod [0..1] : DQ_EvaluationMethod + result [0..1] : DQ_Result QAPercentMissingData QAPercentOutOfBoundsData QAPercentInterpolatedData QAPercentCloudCover AutomaticQualityFlag OperationalQualityFlag ScienceQualityFlag AutomaticQualityFlagExplanation OperationalQualityFlagExplanation ScienceQualityFlagExplanation
  • 13. Data Quality Measures DQM_Measure + measureIdentifier : MD_Identifier + name : CharacterString + alias [0..*] : CharacterString + sourceReference [0..*] : CI_Citation + elementName [1..*] : TypeName + definition : CharacterString + description [0..1] : DQM_Description + valueType : TypeName + valueStructure [0..1] : DQM_ValueStructure + example [0..*] : DQM_Description DQM_BasicMeasure + name : CharacterString + definition : CharacterString + example : DQM_Description [0..1] + valueType : TypeName <<CodeList>> DQ_ValueStructure + bag + set + sequence + table + matrix + coverage DQM_Parameter + name : CharacterString + definition : CharacterString + description: DQM_Description [0..1] + valueType : TypeName + valueStructure [0..1] : DQM_ValueStructure DQM_Description + textDescription: CharacterString + extendedDescription [0..1] : MD_BrowseGraphic DQ_MeasureReference + measureIdentification [0..1] : MD_Identifier + nameOfMeasure [0..*] : CharacterString + measureDescription [0..1] : CharacterString if measureIdentification is not provided, then nameOfMeasure shall be provided + measure 0..1 ISO 19157 data quality measure properties are in a separate object and provide a much more complete description of the measures. <<Abstract>> DQ_Element + measure [0..*] : DQ_MeasureReference + evaluationMethod [0..1] : DQ_EvaluationMethod + result [0..1] : DQ_Result
  • 14. ECHO Quality Measure Descriptions 14 AutomaticQualityFlagExplanation (2289) 66% parameter is produced correctly 7% No automatic quality assessment is performed in the PGE 5% Based on percentage of product that is good. Suspect used where true quality is not known. 4% Automatic quality determination software not yet implemented 2% QA flag explanation ScienceQualityFlagExplanation (346) OperationalQualityFlagExplanation (238) 35% Passed 14% Passed,parameter passed the specified operational test. Inferred Pass,parameter terminated with warnings. Failed parameter terminated with fatal errors. 11% Not Investigated 10% Passed,parameter passed the specified science test. Inferred Pass,parameter terminated with warnings for specified science test. Failed parameter terminated with fatal errors for specified science test. 10% See http://landweb.nascom.nasa.gov/cgi- bin/QA_WWW/qaFlagPage.cgi?sat=aqua for the product Science Quality status. 8% Validated, see http://disc.gsfc.nasa.gov/Aura/MLS/ for quality document 6% See http://landweb.nascom.nasa.gov/cgi- bin/QA_WWW/qaFlagPage.cgi?sat=terra for the product Science Quality status. 5% An updated science quality flag and explanation is put in the product .met file when a granule has been evaluated. The flag value in this file, Not Investigated, is an automatic default that is put into every granule during production.
  • 15. ISO Data Quality Results DQ_ConformanceResult + specification : CI_Citation + explanation : CharacterString + pass : Boolean DQ_CoverageResultDQ_DescriptiveResult + statement : CharacterString <<Abstract>> DQ_Result + dateTime [0..1] : DateTime + resultScope [0..1] : DQ_Scope DQ_QuantitativeResult + valueType [0..1] : RecordType + valueUnit : UnitOfMeasure + value [1..*] : Record Passed Suspect Failed Not Investigated Inferred Passed Being Investigated
  • 16. 16 GCMD ECHO ISO YES!
  • 17. Questions? 17 Questions? tedhabermann@hdfgroup.org
  • 18. Acknowledgements This work was partially supported by contract number NNG10HP02C from NASA. Any opinions, findings, conclusions, or recommendations expressed in this material are those of the author and do not necessarily reflect the views of NASA or The HDF Group.