SlideShare a Scribd company logo
1 of 9
Translation Proofing – Quantitative Tools for
Connecting Metadata Dialects
Ted Habermann
Director of Earth Science
The HDF Group
thabermann@hdfgroup.org
1
Metadata in Multiple Dialects
Documentation
Repository
ISO 19115,
19115-2, 19119
and extensions
THREDDS
HDF, netCDF
(NcML)
FGDC,
Data.Gov
SensorML
WCS, WMS,
WFS, SOS
Open
Provenance
Model, PROV
DIF, ECS,
ECHO
KML
Translation Lossiness
Documentation dialects generally have significant overlap because the
concepts that are being documented (who, where, what, when, and why?)
are shared cross many communities and dialects.
At the same time, there are differences…
A B AB
More Lossy Less Lossy
We are familiar with the idea of lossiness with data compression. How can we
quantify the lossiness of a translation?
Characterizing the Source
The distribution of elements in any metadata collection reflects the requirements
of the data providers and users. Some elements are more common (important?)
than others.
This heterogeneity needs to be considered when evaluating the translation.
448 CSDGM Records
161,151 Elements and Attributes
10,713 Place Keywords
1 /metadata/USGSErp/MetadataNotes
264 elements occur < 100 times
Lossiness = Distribution + Crosswalk
+
Actual Distribution (collection & community) Reference Crosswalk
In order to calculate the lossiness of a translation we need the actual distribution
of elements in the source and a reference crosswalk that gives the destinations
that the source elements are mapped to.
Source Destination
Three Examples
January 8-10, 2014 ESIP Winter 2014 6
Element # % Translated? % Translated
A 134 66% 1 66%
B 50 25% 1 25%
C 20 10% 1 10%
204 1 100%
Element A occurs 134 times and makes up 66% of the source
Element B occurs 50 times and makes up 25% of the source
Element C occurs 20 times and makes up 10% of the source
Element # % Translated? % Translated
A 134 66% 1 66%
B 50 25% 0 0%
C 20 10% 1 10%
204 1 75%
Element # % Translated? % Translated
A 134 66% 1 66%
B 50 25% 1 25%
C 20 10% 0 0%
204 1 91%
100% elements translated: lossiness = 0%
75% elements translated: lossiness = 25%
91% elements translated: lossiness = 9%
Calculating Lossiness
+
Number of Occurrences
Total Number of Elements
*
1 if in crosswalk
0 if not
n = 1
number of
elements
=Lossiness
Actual Distribution (collection & community) Reference Crosswalk
1-
Source Destination
8
Questions?
tedhabermann@hdfgroup.org
Acknowledgements
This work was partially supported by contract number NNG10HP02C from NASA.
Any opinions, findings, conclusions, or recommendations expressed in this material are
those of the author and do not necessarily reflect the views of NASA or The HDF Group.

More Related Content

Viewers also liked

Metadata Evaluation and Improvement
Metadata Evaluation and ImprovementMetadata Evaluation and Improvement
Metadata Evaluation and ImprovementTed Habermann
 
Hdf Augmentation: Interoperability in the Last Mile
Hdf Augmentation: Interoperability in the Last MileHdf Augmentation: Interoperability in the Last Mile
Hdf Augmentation: Interoperability in the Last MileTed Habermann
 
ISO Metadata Improvements - Questions and Answers
ISO Metadata Improvements - Questions and AnswersISO Metadata Improvements - Questions and Answers
ISO Metadata Improvements - Questions and AnswersTed Habermann
 
Granules and ISO Metadata
Granules and ISO MetadataGranules and ISO Metadata
Granules and ISO MetadataTed Habermann
 
19157 Questions and Answers
19157 Questions and Answers19157 Questions and Answers
19157 Questions and AnswersTed Habermann
 
Can ISO 19157 support current NASA data quality metadata?
Can ISO 19157 support current NASA data quality metadata?Can ISO 19157 support current NASA data quality metadata?
Can ISO 19157 support current NASA data quality metadata?Ted Habermann
 

Viewers also liked (7)

Metadata Evaluation and Improvement
Metadata Evaluation and ImprovementMetadata Evaluation and Improvement
Metadata Evaluation and Improvement
 
Hdf Inside
Hdf InsideHdf Inside
Hdf Inside
 
Hdf Augmentation: Interoperability in the Last Mile
Hdf Augmentation: Interoperability in the Last MileHdf Augmentation: Interoperability in the Last Mile
Hdf Augmentation: Interoperability in the Last Mile
 
ISO Metadata Improvements - Questions and Answers
ISO Metadata Improvements - Questions and AnswersISO Metadata Improvements - Questions and Answers
ISO Metadata Improvements - Questions and Answers
 
Granules and ISO Metadata
Granules and ISO MetadataGranules and ISO Metadata
Granules and ISO Metadata
 
19157 Questions and Answers
19157 Questions and Answers19157 Questions and Answers
19157 Questions and Answers
 
Can ISO 19157 support current NASA data quality metadata?
Can ISO 19157 support current NASA data quality metadata?Can ISO 19157 support current NASA data quality metadata?
Can ISO 19157 support current NASA data quality metadata?
 

Similar to Translation proofing

Translation proofing
Translation proofingTranslation proofing
Translation proofingTed Habermann
 
Impact of Tool Support in Patch Construction
Impact of Tool Support in Patch ConstructionImpact of Tool Support in Patch Construction
Impact of Tool Support in Patch ConstructionDongsun Kim
 
Innoslate's Ontology - LML, SysML, DoDAF, and more
Innoslate's Ontology - LML, SysML, DoDAF, and moreInnoslate's Ontology - LML, SysML, DoDAF, and more
Innoslate's Ontology - LML, SysML, DoDAF, and moreElizabeth Steiner
 
Automatic Traceability
Automatic TraceabilityAutomatic Traceability
Automatic TraceabilityRadoslaw Smilgin
 
Innoslate 101: A Webinar for New Users
Innoslate 101: A Webinar for New Users Innoslate 101: A Webinar for New Users
Innoslate 101: A Webinar for New Users SarahCraig7
 
NLP Data Cleansing Based on Linguistic Ontology Constraints
NLP Data Cleansing Based on Linguistic Ontology ConstraintsNLP Data Cleansing Based on Linguistic Ontology Constraints
NLP Data Cleansing Based on Linguistic Ontology ConstraintsDimitris Kontokostas
 
Text Mining Infrastructure in R
Text Mining Infrastructure in RText Mining Infrastructure in R
Text Mining Infrastructure in RAshraf Uddin
 
A Comparison of NER Tools w.r.t. a Domain-Specific Vocabulary
A Comparison of NER Tools w.r.t. a Domain-Specific VocabularyA Comparison of NER Tools w.r.t. a Domain-Specific Vocabulary
A Comparison of NER Tools w.r.t. a Domain-Specific VocabularyTimm Heuss
 
GLOBE Metadata Analysis
GLOBE Metadata AnalysisGLOBE Metadata Analysis
GLOBE Metadata AnalysisXavier Ochoa
 
Enriching the semantic web tutorial session 1
Enriching the semantic web tutorial session 1Enriching the semantic web tutorial session 1
Enriching the semantic web tutorial session 1Tobias Wunner
 
RDF Stream Processing and the role of Semantics
RDF Stream Processing and the role of SemanticsRDF Stream Processing and the role of Semantics
RDF Stream Processing and the role of SemanticsJean-Paul Calbimonte
 
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)University of Washington
 
Large scale ETL with Hadoop
Large scale ETL with HadoopLarge scale ETL with Hadoop
Large scale ETL with HadoopOReillyStrata
 
Large Scale ETL with Hadoop
Large Scale ETL with HadoopLarge Scale ETL with Hadoop
Large Scale ETL with HadoopEric Sammer
 
Text as Data: processing the Hebrew Bible
Text as Data: processing the Hebrew BibleText as Data: processing the Hebrew Bible
Text as Data: processing the Hebrew BibleDirk Roorda
 

Similar to Translation proofing (17)

Translation proofing
Translation proofingTranslation proofing
Translation proofing
 
Impact of Tool Support in Patch Construction
Impact of Tool Support in Patch ConstructionImpact of Tool Support in Patch Construction
Impact of Tool Support in Patch Construction
 
Innoslate's Ontology - LML, SysML, DoDAF, and more
Innoslate's Ontology - LML, SysML, DoDAF, and moreInnoslate's Ontology - LML, SysML, DoDAF, and more
Innoslate's Ontology - LML, SysML, DoDAF, and more
 
Automatic Traceability
Automatic TraceabilityAutomatic Traceability
Automatic Traceability
 
Innoslate 101: A Webinar for New Users
Innoslate 101: A Webinar for New Users Innoslate 101: A Webinar for New Users
Innoslate 101: A Webinar for New Users
 
NLP Data Cleansing Based on Linguistic Ontology Constraints
NLP Data Cleansing Based on Linguistic Ontology ConstraintsNLP Data Cleansing Based on Linguistic Ontology Constraints
NLP Data Cleansing Based on Linguistic Ontology Constraints
 
Text Mining Infrastructure in R
Text Mining Infrastructure in RText Mining Infrastructure in R
Text Mining Infrastructure in R
 
A Comparison of NER Tools w.r.t. a Domain-Specific Vocabulary
A Comparison of NER Tools w.r.t. a Domain-Specific VocabularyA Comparison of NER Tools w.r.t. a Domain-Specific Vocabulary
A Comparison of NER Tools w.r.t. a Domain-Specific Vocabulary
 
Assessing Product Line Derivation Operators Applied to Java Source Code: An E...
Assessing Product Line Derivation Operators Applied to Java Source Code: An E...Assessing Product Line Derivation Operators Applied to Java Source Code: An E...
Assessing Product Line Derivation Operators Applied to Java Source Code: An E...
 
GLOBE Metadata Analysis
GLOBE Metadata AnalysisGLOBE Metadata Analysis
GLOBE Metadata Analysis
 
Some Information Retrieval Models and Our Experiments for TREC KBA
Some Information Retrieval Models and Our Experiments for TREC KBASome Information Retrieval Models and Our Experiments for TREC KBA
Some Information Retrieval Models and Our Experiments for TREC KBA
 
Enriching the semantic web tutorial session 1
Enriching the semantic web tutorial session 1Enriching the semantic web tutorial session 1
Enriching the semantic web tutorial session 1
 
RDF Stream Processing and the role of Semantics
RDF Stream Processing and the role of SemanticsRDF Stream Processing and the role of Semantics
RDF Stream Processing and the role of Semantics
 
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
 
Large scale ETL with Hadoop
Large scale ETL with HadoopLarge scale ETL with Hadoop
Large scale ETL with Hadoop
 
Large Scale ETL with Hadoop
Large Scale ETL with HadoopLarge Scale ETL with Hadoop
Large Scale ETL with Hadoop
 
Text as Data: processing the Hebrew Bible
Text as Data: processing the Hebrew BibleText as Data: processing the Hebrew Bible
Text as Data: processing the Hebrew Bible
 

Recently uploaded

All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfSumit Kumar yadav
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...SĂ©rgio Sacani
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhousejana861314
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 sciencefloriejanemacaya1
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSĂ©rgio Sacani
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxBroad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxjana861314
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfSumit Kumar yadav
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |aasikanpl
 
Lucknow đź’‹ Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow đź’‹ Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow đź’‹ Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow đź’‹ Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 

Recently uploaded (20)

All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhouse
 
Boyles law module in the grade 10 science
Boyles law module in the grade 10 scienceBoyles law module in the grade 10 science
Boyles law module in the grade 10 science
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxBroad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptx
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
 
Lucknow đź’‹ Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow đź’‹ Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow đź’‹ Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow đź’‹ Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 

Translation proofing

  • 1. Translation Proofing – Quantitative Tools for Connecting Metadata Dialects Ted Habermann Director of Earth Science The HDF Group thabermann@hdfgroup.org 1
  • 2. Metadata in Multiple Dialects Documentation Repository ISO 19115, 19115-2, 19119 and extensions THREDDS HDF, netCDF (NcML) FGDC, Data.Gov SensorML WCS, WMS, WFS, SOS Open Provenance Model, PROV DIF, ECS, ECHO KML
  • 3. Translation Lossiness Documentation dialects generally have significant overlap because the concepts that are being documented (who, where, what, when, and why?) are shared cross many communities and dialects. At the same time, there are differences… A B AB More Lossy Less Lossy We are familiar with the idea of lossiness with data compression. How can we quantify the lossiness of a translation?
  • 4. Characterizing the Source The distribution of elements in any metadata collection reflects the requirements of the data providers and users. Some elements are more common (important?) than others. This heterogeneity needs to be considered when evaluating the translation. 448 CSDGM Records 161,151 Elements and Attributes 10,713 Place Keywords 1 /metadata/USGSErp/MetadataNotes 264 elements occur < 100 times
  • 5. Lossiness = Distribution + Crosswalk + Actual Distribution (collection & community) Reference Crosswalk In order to calculate the lossiness of a translation we need the actual distribution of elements in the source and a reference crosswalk that gives the destinations that the source elements are mapped to. Source Destination
  • 6. Three Examples January 8-10, 2014 ESIP Winter 2014 6 Element # % Translated? % Translated A 134 66% 1 66% B 50 25% 1 25% C 20 10% 1 10% 204 1 100% Element A occurs 134 times and makes up 66% of the source Element B occurs 50 times and makes up 25% of the source Element C occurs 20 times and makes up 10% of the source Element # % Translated? % Translated A 134 66% 1 66% B 50 25% 0 0% C 20 10% 1 10% 204 1 75% Element # % Translated? % Translated A 134 66% 1 66% B 50 25% 1 25% C 20 10% 0 0% 204 1 91% 100% elements translated: lossiness = 0% 75% elements translated: lossiness = 25% 91% elements translated: lossiness = 9%
  • 7. Calculating Lossiness + Number of Occurrences Total Number of Elements * 1 if in crosswalk 0 if not n = 1 number of elements =Lossiness Actual Distribution (collection & community) Reference Crosswalk 1- Source Destination
  • 9. Acknowledgements This work was partially supported by contract number NNG10HP02C from NASA. Any opinions, findings, conclusions, or recommendations expressed in this material are those of the author and do not necessarily reflect the views of NASA or The HDF Group.