Translation proofing

Translation Proofing – Quantitative Tools for
Connecting Metadata Dialects
Ted Habermann
Director of Earth Science
The HDF Group
thabermann@hdfgroup.org
1

Metadata in Multiple Dialects
Documentation
Repository
ISO 19115,
19115-2, 19119
and extensions
THREDDS
HDF, netCDF
(NcML)
FGDC,
Data.Gov
SensorML
WCS, WMS,
WFS, SOS
Open
Provenance
Model, PROV
DIF, ECS,
ECHO
KML

Translation Lossiness
Documentation dialects generally have significant overlap because the
concepts that are being documented (who, where, what, when, and why?)
are shared cross many communities and dialects.
At the same time, there are differences…
A B AB
More Lossy Less Lossy
We are familiar with the idea of lossiness with data compression. How can we
quantify the lossiness of a translation?

Characterizing the Source
The distribution of elements in any metadata collection reflects the requirements
of the data providers and users. Some elements are more common (important?)
than others.
This heterogeneity needs to be considered when evaluating the translation.
448 CSDGM Records
161,151 Elements and Attributes
10,713 Place Keywords
1 /metadata/USGSErp/MetadataNotes
264 elements occur < 100 times

Lossiness = Distribution + Crosswalk
+
Actual Distribution (collection & community) Reference Crosswalk
In order to calculate the lossiness of a translation we need the actual distribution
of elements in the source and a reference crosswalk that gives the destinations
that the source elements are mapped to.
Source Destination

Three Examples
January 8-10, 2014 ESIP Winter 2014 6
Element # % Translated? % Translated
A 134 66% 1 66%
B 50 25% 1 25%
C 20 10% 1 10%
204 1 100%
Element A occurs 134 times and makes up 66% of the source
Element B occurs 50 times and makes up 25% of the source
Element C occurs 20 times and makes up 10% of the source
A 134 66% 1 66%
B 50 25% 0 0%
C 20 10% 1 10%
204 1 75%
A 134 66% 1 66%
B 50 25% 1 25%
C 20 10% 0 0%
204 1 91%
100% elements translated: lossiness = 0%

Calculating Lossiness
+
Number of Occurrences
Total Number of Elements
*
1 if in crosswalk
0 if not
n = 1
number of
elements
=Lossiness
Actual Distribution (collection & community) Reference Crosswalk
1-
Source Destination

8
Questions?
tedhabermann@hdfgroup.org

Acknowledgements
This work was partially supported by contract number NNG10HP02C from NASA.
Any opinions, findings, conclusions, or recommendations expressed in this material are
those of the author and do not necessarily reflect the views of NASA or The HDF Group.

Translation proofing

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (7)

Similar to Translation proofing

Similar to Translation proofing (17)

Recently uploaded

Recently uploaded (20)

Translation proofing