Analysis of the
Quality Metadata in
GEOSS Clearinghouse
QUAlity aware VIsualisation for
the Global Earth Observation
syste...
Objectives
• To get a first analysis of the data quality in the
Clearinghouse
• Analyze the quality contained in the metad...
Methodology
97203 XML
documents
CSW
GEOSS
Clearinghouse
www.geoviqua.org
• Harvest all XML documents,
ISO 19115. (October ...
Methodology
97203 XML
documents
Database
GestBD
Xpath
extraction
CSW
GEOSS
Clearinghouse
www.geoviqua.org
• Massive extrac...
Overall Results
• Total metadata records in the
Clearinghouse
– 97203
• Total number of quality
indicators
– 52187
www.geo...
Quality Scope
• 19.66% Metadata records with quality indicators
– 2.7 quality indicator per metadata record
www.geoviqua.o...
1. Quality indicators
• 19.66% Metadata records with quality
www.geoviqua.org
Quality indicators
• 19.66% Metadata records with quality
– 2.7 QI/MD
www.geoviqua.org
Quality indicators
www.geoviqua.org
Quality Indicators in IDEC Metadata
Quality indicators – Comparison
Clearinghouse - IDEC
www.geoviqua.org
Positional
Accur...
Quality indicator result
85.8%
(22275 QI)
14.18%
(3669 QI
mainly conformance to INSPIRE)
www.geoviqua.org
0.02%
(5 QI)
191...
Quality indicator result
www.geoviqua.org
Quality indicators - Quantitative
10000
12000
14000
16000
18000
Numberofqualityelements
Quality elements - Quantitative me...
Quality indicators - Qualitative
600
800
1000
1200
1400
Numberofqualityelements
Quality elements - Conformance measures
ww...
Coverage result
(ISO19115-2 extension)
• Clearinghouse record ID: 273234, 273232, 273233, 273235, 273236)
• Only 5 records...
2. Lineage
www.geoviqua.org
2. Lineage
www.geoviqua.org
2. Lineage
www.geoviqua.org
LI_ProcessStep with LI_Source
Example
Clearinghouse record ID 131007 (simplified)
• Compile survey input data from the bes...
LI_Lineage: LI_Source
• 6.02% metadata records (5851)
contain direct list of the data
sources.
– 1.85% (1798) with tempora...
LI_Lineage: LI_ProcessStep
• 8.26% metadata records
(8035) contain the direct
list of the processes
without sources
– 292 ...
Complete Provenance:
MD_ProcessStep with MD_Source
• 1.26% metadata
records (1226 ) with
more complete
provenance process ...
Complete provenance in ISO19115-2
• LI_ProcessStep includes a
LE_Processing that has a
runTimeParameters attribute that
al...
3. Usage - User feedback
www.geoviqua.org
• There is one small entry for user
feedback in the current ISO-19115:
• MD_Usag...
• There are 1.2% (1133) entries
– SpecificUsage and
– UserContactInfo, only
• All made by the same institution!:
MD_Usage ...
Conclusions
• There are many different kinds of quality indicators
– There is a lack of a complete description of values p...
Thank you!
Danke!
Grazie!
Merci!
Ευχαριστίες!
Diolch!
Bedankt!
Köszönöm!
Ačiū!
Благодарам! Спасибі!Ευχαριστίες!
Vďaka!
Tak...
Upcoming SlideShare
Loading in …5
×

Analysis of quality metadata in the GEOSS Clearinghouse

226 views
154 views

Published on

Díaz, P., Masó, J., Sevillano, E., Ninyerola, M., Zabala, A., Serral, I., Pons, X. (2012). Analysis of quality metadata in the
GEOSS Clearinghouse. International Journal of Spatial Data Infrastructures Research. Vol 7 (2012), pp. 352-377.

Published in: Data & Analytics
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
226
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Analysis of quality metadata in the GEOSS Clearinghouse

  1. 1. Analysis of the Quality Metadata in GEOSS Clearinghouse QUAlity aware VIsualisation for the Global Earth Observation system of systems SEVILLANO Eva1, DÍAZ Paula2, NINYEROLA Miquel1, MASÓ Joan2 , ZABALA Alaitz1, PONS Xavier1 1 UAB Universitat Autònoma de Barcelona. 2 CREAF Centre for Ecological Research and Forestry Applications.
  2. 2. Objectives • To get a first analysis of the data quality in the Clearinghouse • Analyze the quality contained in the metadata (ISO 19115) – Quality indicators www.geoviqua.org – Quality indicators – Lineage – Usage • Start building components for the GEO Portal – Quality Broker – Quality searcher – Quality visualization
  3. 3. Methodology 97203 XML documents CSW GEOSS Clearinghouse www.geoviqua.org • Harvest all XML documents, ISO 19115. (October 2011)
  4. 4. Methodology 97203 XML documents Database GestBD Xpath extraction CSW GEOSS Clearinghouse www.geoviqua.org • Massive extraction of MD quality elements – Quality indicators – Lineage – Usage
  5. 5. Overall Results • Total metadata records in the Clearinghouse – 97203 • Total number of quality indicators – 52187 www.geoviqua.org – 52187 • Metadata records with quality indicators – 19107 • Metadata records with lineage – 10899 (9261 process, 3771 source) • Metadata with usage – 1226
  6. 6. Quality Scope • 19.66% Metadata records with quality indicators – 2.7 quality indicator per metadata record www.geoviqua.org
  7. 7. 1. Quality indicators • 19.66% Metadata records with quality www.geoviqua.org
  8. 8. Quality indicators • 19.66% Metadata records with quality – 2.7 QI/MD www.geoviqua.org
  9. 9. Quality indicators www.geoviqua.org
  10. 10. Quality Indicators in IDEC Metadata Quality indicators – Comparison Clearinghouse - IDEC www.geoviqua.org Positional Accuracy 95.38% Thematic Accuracy 2.60% Completeness 0.46% Temporal Accuracy 0.06% Logical Consistency 0.02% Quality Indicators in IDEC Metadata GEOSS IDEC
  11. 11. Quality indicator result 85.8% (22275 QI) 14.18% (3669 QI mainly conformance to INSPIRE) www.geoviqua.org 0.02% (5 QI) 19115-2 Extension for "per pixel" quality
  12. 12. Quality indicator result www.geoviqua.org
  13. 13. Quality indicators - Quantitative 10000 12000 14000 16000 18000 Numberofqualityelements Quality elements - Quantitative measures www.geoviqua.org 0 2000 4000 6000 8000 10000 Numberofqualityelements Complete value Declarevalue Quantitativetype
  14. 14. Quality indicators - Qualitative 600 800 1000 1200 1400 Numberofqualityelements Quality elements - Conformance measures www.geoviqua.org 0 200 400 600 Numberofqualityelements Conformance to specification Declare conformance Conformance type
  15. 15. Coverage result (ISO19115-2 extension) • Clearinghouse record ID: 273234, 273232, 273233, 273235, 273236) • Only 5 records use this. Bad news for visualizing data + quality maps • Title: OMNO2e:OMI Column Amount NO2:ColumnAmountNO2CS30 <gmd:DQ_QuantitativeAttributeAccuracy> <gmd:measureDescription> <gco:CharacterString>The 'version 003' product is the second public release. It is based on improved radiance calibration. For details, please see document: www.geoviqua.org radiance calibration. For details, please see document: http://disc.sci.gsfc.nasa.gov/Aura/OMI/OMTO3e_v003.shtml</gco:CharacterString> </gmd:measureDescription> <gmd:result><gmi:QE_CoverageResult> <gmi:spatialRepresentationType><gmd:MD_SpatialRepresentationTypeCode codeList="./resources/codeList.xml#MD_SpatialRepresentationTypeCode" codeListValue="grid">grid</gmd:MD_SpatialRepresentationTypeCode></gmi:spatialRepresentationType> <gmi:resultFile gco:nilReason="missing" /> <gmi:resultFormat> <gmd:MD_Format> <gmd:name><gco:CharacterString>CF-netCDF</gco:CharacterString></gmd:name> </gmd:MD_Format> </gmi:resultFormat> </gmi:QE_CoverageResult></gmd:result> </gmd:DQ_QuantitativeAttributeAccuracy>
  16. 16. 2. Lineage www.geoviqua.org
  17. 17. 2. Lineage www.geoviqua.org
  18. 18. 2. Lineage www.geoviqua.org
  19. 19. LI_ProcessStep with LI_Source Example Clearinghouse record ID 131007 (simplified) • Compile survey input data from the best and most current survey records. – BLM database of the index to all official (microfilm, CD, other) BLM survey records. – USFS survey records. – Private land surveyor records – GCDB Data Collection Attribute Definitions Version 2.0, Appendix A, 2/14/1991. Survey records used - source abbreviations. • Compile listings of known locations of PLSS corners. – USGS topographic quadrangles and other sources. – USC&amp;GS published coordinate data. – NGS published coordinate data. – BLM global positioning Data. – USFS global positioning data. • Coordinates of control stations are entered into a control data base with associated reliabilities. • Topologically correct GIS coverages are modified to use FGDC compliant naming www.geoviqua.org • Topologically correct GIS coverages are modified to use FGDC compliant naming conventions and then loaded into the LSI database. These layers can then be downloaded as shapefiles through the LSI website. • GCDB Data was downloaded for Kiowa and Cheyenne Counties, Colorado. – C:fgis_datasandzippedkiowatwnshp.shp.xml • Metadata imported and data was exported from regions format to shapefile format • Dataset copied. – C:fgis_datasanddatabasedataplssck_gcdb_region_township • Source Contribution: Survey data in the form of official (microfilm, CD, other) survey and BLM, abstracted into a vector digital format.online • Source Contribution: Survey and control data from the Cartographic Feature File (CFF) data set.disc • Source Contribution: Digitized control data from standard topological quadrangle sheets.disc
  20. 20. LI_Lineage: LI_Source • 6.02% metadata records (5851) contain direct list of the data sources. – 1.85% (1798) with temporal extent class LI_Source_only LI_Lineage + statement :CharacterString [0..1] + scope :DQ_Scope [0..*] constraints {"source" role is mandatory if LI_Lineage.statement and "processStep" role are not documented} Metadata Information::MD_Metadata +resourceLineage 0..* www.geoviqua.org • Gives credit (attribution, and eventually some trust on them) • If quality indicators are not provided for the dataset, the quality indicators from sources can be a clue. LI_Source + description :CharacterString [0..1] + sourceSpatialResolution :MD_Resolution [0..1] + sourceReferenceSystem :MD_ReferenceSystem [0..1] + sourceCitation :CI_Citation [0..1] + sourceMetadata :CI_Citation [0..*] + scope :DQ_Scope [0..*] constraints {"description" is mandatory if "scope" is not documented} {"scope" is mandatory if "description" is not documented} and "processStep" role are not documented} {"processStep" role is mandatory if LI_Lineage.statement and "source" role are not documented} +source 0..*
  21. 21. LI_Lineage: LI_ProcessStep • 8.26% metadata records (8035) contain the direct list of the processes without sources – 292 (0.30%) contain date class From_LI_ProcessStep_to_LI_Source LI_Lineage + statement :CharacterString [0..1] + scope :DQ_Scope [0..*] constraints Metadata Information::MD_Metadata +resourceLineage0..* www.geoviqua.org • With the order of these processes. • If quality indicators are not provided for the dataset, it’s difficult to infer resource quality with only a process list LI_ProcessStep + description :CharacterString + rationale :CharacterString [0..1] + stepDateTime :TM_Primitive [0..*] + processor :CI_ResponsiblePartyInfo [0..*] + reference :CI_Citation [0..*] + scope :DQ_Scope [0..*] constraints {"source" role is mandatory if LI_Lineage.statement and "processStep" role are not documented} {"processStep" role is mandatory if LI_Lineage.statement and "source" role are not documented} +processStep 0..*
  22. 22. Complete Provenance: MD_ProcessStep with MD_Source • 1.26% metadata records (1226 ) with more complete provenance process . • How and when the data sources where used class From_LI_ProcessStep_to_LI_Source LI_Lineage + statement :CharacterString [0..1] + scope :DQ_Scope [0..*] constraints {"source" role is mandatory if LI_Lineage.statement and "processStep" role are not documented} Metadata Information::MD_Metadata +resourceLineage0..* www.geoviqua.org sources where used • If quality indicators are not provided for the dataset, we can deduce which sources have more influence in the quality of the final result LI_Source + description :CharacterString [0..1] + sourceSpatialResolution :MD_Resolution [0..1] + sourceReferenceSystem :MD_ReferenceSystem [0..1] + sourceCitation :CI_Citation [0..1] + sourceMetadata :CI_Citation [0..*] + scope :DQ_Scope [0..*] constraints {"description" is mandatory if "scope" is not documented} {"scope" is mandatory if "description" is not documented} LI_ProcessStep + description :CharacterString + rationale :CharacterString [0..1] + stepDateTime :TM_Primitive [0..*] + processor :CI_ResponsiblePartyInfo [0..*] + reference :CI_Citation [0..*] + scope :DQ_Scope [0..*] and "processStep" role are not documented} {"processStep" role is mandatory if LI_Lineage.statement and "source" role are not documented} +processStep 0..* +source 0..*
  23. 23. Complete provenance in ISO19115-2 • LI_ProcessStep includes a LE_Processing that has a runTimeParameters attribute that allows us describing the exact list of parameters used in the execution. • There is a citation of the algorithm used (LI_Algorithm). class From_LE_ProcessStep_to_LE_Source LI_Source + description :CharacterString [0..1] + sourceSpatialResolution :MD_Resolution [0..1] + sourceReferenceSystem :MD_ReferenceSystem [0..1] + sourceCitation :CI_Citation [0..1] + sourceMetadata :CI_Citation [0..*] + scope :DQ_Scope [0..*] constraints {"description" is mandatory if "scope" is not documented} {"scope" is mandatory if "description" is not documented} LI_ProcessStep + description :CharacterString + rationale :CharacterString [0..1] + stepDateTime :TM_Primitive [0..*] + processor :CI_ResponsiblePartyInfo [0..*] + reference :CI_Citation [0..*] + scope :DQ_Scope [0..*] LI_Lineage + statement :CharacterString [0..1] + scope :DQ_Scope [0..*] constraints {"source" role is mandatory if LI_Lineage.statement and "processStep" role are not documented} {"processStep" role is mandatory if LI_Lineage.statement and "source" role are not documented} Metadata Information::MD_Metadata +processStep 0..* +resourceLineage 0..* www.geoviqua.org used (LI_Algorithm). • All these extensions were done for the benefit of the EO gridded data, but there are not in the Clearinghouse. • We can completely evaluate the quality of the resulting product if we know the uncertainties that sources have in their metadata (sourceMetadata citation in LI_Source). From ISO 19115-2:2009 shown for informative purposes only Data quality information - Imagery:: LE_ProcessStep Data quality information - Imagery:: LE_ProcessStepReport + name :CharacterString + description :CharacterString [0..1] + fileType :CharacterString [0..1] If "LE_NominalResolution.scanningResolution" is used then "LE_Source.scaleDenominator" is required Data quality information - Imagery:: LE_Source + processedLevel :MD_Identifier [0..1] + resolution :LE_NominalResolution [0..1] Data quality information - Imagery:: LE_Processing + identifier :MD_Identifier + softwareReference :CI_Citation [0..1] + procedureDescription :CharacterString [0..1] + documentation :CI_Citation [0..*] + runTimeParameters :CharacterString [0..1] Data quality information - Imagery::LE_Algorithm + citation :CI_Citation + description :CharacterString «Union» Data quality information - Imagery:: LE_NominalResolution + scanningResolution :Distance + groundResolution :Distance "description" is mandatory if "sourceExtent" is not documented "sourceExtent" is mandatory if "description" is not documented +report 0..* +output 0..* +processingInformation0..1 +algorithm 0..*
  24. 24. 3. Usage - User feedback www.geoviqua.org • There is one small entry for user feedback in the current ISO-19115: • MD_Usage – Brief description of ways in which the resource is currently or has been used
  25. 25. • There are 1.2% (1133) entries – SpecificUsage and – UserContactInfo, only • All made by the same institution!: MD_Usage - User feedback www.geoviqua.org – Landesvermessung und Geobasisinformation Brandenburg (LGB) – Tel +49-331-8844-123, Fax. +49-331-8844-16123 – Heinrich-Mann-Allee 103, Potsdam, Brandenburg 14473, Deutschland – kundenservice@geobasis-bb.de – http://www.geobasis-bb.de
  26. 26. Conclusions • There are many different kinds of quality indicators – There is a lack of a complete description of values provided (no units, missing measure name, missing evaluation method) • Quality coverage results (by pixel) are almost inexistent and the the link is not there • Lineage information is rich in many records, some with more that 100 entries in source or ProcessSteps www.geoviqua.org entries in source or ProcessSteps • We have usage examples -> Feedback • Current data is enough to demonstrate search and visualization with some limitations. Good for GeoViQua. • Next steps: – Assess the Quality of Quality Metadata? – Extend this analysis to other capacity catalogues integrated in the EuroGEOSS Broker
  27. 27. Thank you! Danke! Grazie! Merci! Ευχαριστίες! Diolch! Bedankt! Köszönöm! Ačiū! Благодарам! Спасибі!Ευχαριστίες! Vďaka! Tak! Díky! Tänan! Kiitos! Благодарам! Dzięki! Mulţumiri! Хвала! Tack! Teşekkürler! Спасибі! Спасибо! Obrigado! Takk! Gràcies! Gracias! QUAlity aware VIsualisation for the Global Earth Observation system of systems

×