Communities use many different dialects to document their data. We need to be able to translate between these dialects and to understand how much is lost in translation
The HDF Product Designer – Interoperability in the First MileTed Habermann
Interoperable data have been a long-time goal in many scientific communities. The recent growth in analysis, visualization and mash-up applications that expect data stored in a standardized manner has brought the interoperability issue to the fore. On the other hand, producing interoperable data is often regarded as a sideline task in a typical research team for which resources are not readily available. The HDF Group is developing a software tool aimed at lessening the burden of creating data in standards-compliant, interoperable HDF5 files. The tool, named HDF Product Designer, lowers the threshold needed to design such files by providing a user interface that combines the rich HDF5 feature set with applicable metadata conventions. Users can quickly devise new HDF5 files while at the same time seamlessly incorporating the latest best practices and conventions from their community. That is what the term interoperability in the first mile means: enabling generation of interoperable data in HDF5 files from the onset of their production. The tool also incorporates collaborative features, allowing team approach in the file design, as well as easy transfer of best practices as they are being developed. The current state of the tool and the plans for future development will be presented. Constructive input from interested parties is always welcome.
Science platforms are made up of (at least) four planks: data formats, services, tools and conventions. I focus here on formats and conventions, specifically the HDF5 format, already used in many disciplines, and the Climate-Forecast and HDF-EOS Conventions. Many science disciplines have already agreed on HDF as the preferred format for storing and sharing data. It is well established in high performance computing and supports arbitrary grouping and annotation. Community conventions are critical for useful data on top of the format. The Climate-Forecast (CF) conventions were created for relatively simple gridded data types while the HDF-EOS conventions originally considered more complex data (swaths). Making simple conventions more complex makes adoption more difficult. Community input and the need for stable data processing systems must be balanced in governance of conventions.
Wikis, Rubrics and Views: An Integrated Approach to Improving DocumentationTed Habermann
For many years scientists and data managers have focused on creating metadata that supports the discovery of available data. This is important, but once data sets are discovered, users need metadata that supports use and understanding of those data. This talk describes a system developed to support the required metadata improvements using wikis, rubrics, and metadata views. The wikis provide a mechanism for the community to record experiences and lessons learned and provide high-quality examples. Rubrics provide a mechanism for consistent and clear quantitative evaluation of the completeness of metadata records. The results displays include integrated links to the wiki. Views provide views with connections to the wiki and on-going interactive learning. These tools can be used with metadata from any standard and can facilitate translation of the metadata between multiple standards.
The ISO Metadata Standards include the capability to add citations to many kinds of external resources. This is very important for providing complete documentation required to understand and reproduce scientific results.
New data access paradigms support a variety of human and machine access paths with data servers (THREDDS, https://www.unidata.ucar.edu/software/thredds/current/tds/ and Hyrax, http://opendap.org) that support multiple services for a given dataset. We need metadata that can describe those services and unambiguously differentiate between access paths for humans and for machines. The ISO 19115 metadata standard includes service metadata and allows data and services for that data to be described in the same record. I propose that we use the service metadata for machine access and the more traditional distribution information for human access. This talk was presented at the ESIP (espied.org) meeting during January 2014.
NASA's Earth Observing System (EOS) archive includes data collected over many years by many satellite instruments. These data are stored in the HDF format that includes data and metadata. The content of the metadata was examined for compliance with a set of conventions developed by the NASA science community at the beginning of the EOS Project (the HDF-EOS conventions). The initial results show that ~50% of the data files and 76% of the datasets have metadata that allows them to be used easily in standard tools. This talk was presented at the ESIP (espied.org) meeting during January 2014.
The NASA Earth Science Data and Information System (ESDIS) is migrating documentation for their data and products towards International Standards developed by ISO Technical Committee 211 (ISO/TC211). In order to do this effectively, NASA must understand and participate in the ISO process. This presentation was given at a NASA ISO Seminar during November 2012. It outlines the ISO standards process and describes some extensions to the ISO standards that are being proposed to address ESDIS requirements not addressed in the original standard.
We are interested in developing a standard method for writing ISO TC211 compliant metadata into HDF data files. This presentation shows some initial workflows for this using the HDF Product Designer.
The HDF Product Designer – Interoperability in the First MileTed Habermann
Interoperable data have been a long-time goal in many scientific communities. The recent growth in analysis, visualization and mash-up applications that expect data stored in a standardized manner has brought the interoperability issue to the fore. On the other hand, producing interoperable data is often regarded as a sideline task in a typical research team for which resources are not readily available. The HDF Group is developing a software tool aimed at lessening the burden of creating data in standards-compliant, interoperable HDF5 files. The tool, named HDF Product Designer, lowers the threshold needed to design such files by providing a user interface that combines the rich HDF5 feature set with applicable metadata conventions. Users can quickly devise new HDF5 files while at the same time seamlessly incorporating the latest best practices and conventions from their community. That is what the term interoperability in the first mile means: enabling generation of interoperable data in HDF5 files from the onset of their production. The tool also incorporates collaborative features, allowing team approach in the file design, as well as easy transfer of best practices as they are being developed. The current state of the tool and the plans for future development will be presented. Constructive input from interested parties is always welcome.
Science platforms are made up of (at least) four planks: data formats, services, tools and conventions. I focus here on formats and conventions, specifically the HDF5 format, already used in many disciplines, and the Climate-Forecast and HDF-EOS Conventions. Many science disciplines have already agreed on HDF as the preferred format for storing and sharing data. It is well established in high performance computing and supports arbitrary grouping and annotation. Community conventions are critical for useful data on top of the format. The Climate-Forecast (CF) conventions were created for relatively simple gridded data types while the HDF-EOS conventions originally considered more complex data (swaths). Making simple conventions more complex makes adoption more difficult. Community input and the need for stable data processing systems must be balanced in governance of conventions.
Wikis, Rubrics and Views: An Integrated Approach to Improving DocumentationTed Habermann
For many years scientists and data managers have focused on creating metadata that supports the discovery of available data. This is important, but once data sets are discovered, users need metadata that supports use and understanding of those data. This talk describes a system developed to support the required metadata improvements using wikis, rubrics, and metadata views. The wikis provide a mechanism for the community to record experiences and lessons learned and provide high-quality examples. Rubrics provide a mechanism for consistent and clear quantitative evaluation of the completeness of metadata records. The results displays include integrated links to the wiki. Views provide views with connections to the wiki and on-going interactive learning. These tools can be used with metadata from any standard and can facilitate translation of the metadata between multiple standards.
The ISO Metadata Standards include the capability to add citations to many kinds of external resources. This is very important for providing complete documentation required to understand and reproduce scientific results.
New data access paradigms support a variety of human and machine access paths with data servers (THREDDS, https://www.unidata.ucar.edu/software/thredds/current/tds/ and Hyrax, http://opendap.org) that support multiple services for a given dataset. We need metadata that can describe those services and unambiguously differentiate between access paths for humans and for machines. The ISO 19115 metadata standard includes service metadata and allows data and services for that data to be described in the same record. I propose that we use the service metadata for machine access and the more traditional distribution information for human access. This talk was presented at the ESIP (espied.org) meeting during January 2014.
NASA's Earth Observing System (EOS) archive includes data collected over many years by many satellite instruments. These data are stored in the HDF format that includes data and metadata. The content of the metadata was examined for compliance with a set of conventions developed by the NASA science community at the beginning of the EOS Project (the HDF-EOS conventions). The initial results show that ~50% of the data files and 76% of the datasets have metadata that allows them to be used easily in standard tools. This talk was presented at the ESIP (espied.org) meeting during January 2014.
The NASA Earth Science Data and Information System (ESDIS) is migrating documentation for their data and products towards International Standards developed by ISO Technical Committee 211 (ISO/TC211). In order to do this effectively, NASA must understand and participate in the ISO process. This presentation was given at a NASA ISO Seminar during November 2012. It outlines the ISO standards process and describes some extensions to the ISO standards that are being proposed to address ESDIS requirements not addressed in the original standard.
We are interested in developing a standard method for writing ISO TC211 compliant metadata into HDF data files. This presentation shows some initial workflows for this using the HDF Product Designer.
For many years metadata development activities have focused on developing and sharing metadata for discovering data. This is important. Once data are discovered, metadata supporting use and understanding become important. Efforts to encourage scientists and data providers to create those metadata have had limited success. This talk describes some approaches and tools for supporting the organizational change efforts required to integrate use and understanding metadata into organizational cultures. These approaches are described in terms of the ideas presented in Switch: How to Change Things When Change is Hard.
The HDF format is the foundation for sharing data in many communities that have created domain-specific conventions on top of HDF. This presentation was given at the Winter meeting of the Earth Science Information Partnership (ESIP).
Hdf Augmentation: Interoperability in the Last MileTed Habermann
Science data files are generally written to serve well-defined purposes for a small science teams. In many cases, the organization of the data and the metadata are designed for custom tools developed and maintained by and for the team. Using these data outside of this context many times involves restructuring, re-documenting, or reformatting the data. This expensive and time-consuming process usually prevents data reuse and thus decreases the total life-cycle value of the data considerably. If the data are unique or critically important to solving a particular problem, they can be modified into a more generally usable form or metadata can be added in order to enable reuse. This augmentation process can be done to enhance data for the intended purpose or for a new purpose, to make the data available to new tools and applications, to make the data more conventional or standard, or to simplify preservation of the data. The HDF Group has addressed augmentation needs in many ways: by adding extra information, by renaming objects or moving them around in the file, by reducing complexity of the organization, and sometimes by hiding data objects that are not understood by specific applications. In some cases these approaches require re-writing the data into new files and in some cases it can be done externally, without affecting the original file. We will describe and compare several examples of each approach.
ISO Metadata Improvements - Questions and AnswersTed Habermann
The ISO Standards for describing geospatial data, services, and other resources are changing. These slides describe a few of these changes in terms of documentation needs and how the new standards address these needs. I talked with these slides at a recent webinar that is available at https://www.youtube.com/watch?v=un-PtJLclIM&feature=youtu.be
Can ISO 19157 support current NASA data quality metadata?Ted Habermann
ISO 19157 provides a powerful framework for describing quality of Earth science datasets. As NASA migrates towards using that standard, it is important to understand whether and how existing data quality content fits into the ISO 19157 model. This talk demonstrates that fit and concludes that ISO 19157 can include all existing content and also includes new capabilities that can be very useful for all kinds of NASA data users.
If you are translating metadata between dialects do you know what you are losing? There is a way to identify it and quantitatively characterize lossiness of the translation.
Impact of Tool Support in Patch ConstructionDongsun Kim
Anil Koyuncu, Tegawendé F. Bissyandé, Dongsun Kim, Jacques Klein, Martin Monperrus, and Yves Le Traon, “Impact of Tool Support in Patch Construction,” in Proceedings of the 26th International Symposium on Software Testing and Analysis (ISSTA 2017), Santa Barbara, California, United States, July 10-14, 2017.
Everyone talks about "data-centricity," but what does that mean in practical terms. It means that you have to have a well defined ontology that can capture the information needed to describe the architecture or system you work with or want to create. An ontology is simply the taxonomy of entity classes (bins of information) and how those classes are related to each other. In this webinar, we will discuss a relatively new ontology, the Lifecycle Modeling Language (LML). LML provides the basis for Innoslate's database schema. In this webinar, we will discuss each entity class and why it was developed. Dr. Steven Dam, who is the Secretary of the LML Steering Committee, will present the details of the language and how it relates to other ontologies/languages, such as the DoDAF MetaModel 2.0 and SysML. He will also discuss the ways to visualize this information to enhance understanding of the information and how to use that information to make decisions about the architecture or system.
Innoslate 101: A Webinar for New Users SarahCraig7
"Innoslate 101: A Webinar for New Users." Dr. Steven Dam is going to show you just how easy it is to learn Innoslate. He will walk you through the ins and outs of the tool and show you how you can become an expert Innoslate user in no time.
What Is Covered?
- Basic Navigation and Usage
- Understanding the Different Views
- The Lifecycle of a System or Product
- Creating a requirements document
- Developing physical and functional models
- Executing functional models
- Reports and more
Innoslate is the model-based systems engineering solution of the future. An all-in-one software package made for systems engineers and program managers, you can keep your requirements management, modeling and simulation, test management, and more all in one place. Smarter, more successful systems start here. Create a trial account at innoslate.com/signup.
Slides for the following paper: NLP Data Cleansing Based on Linguistic Ontology Constraints
Abstract: Linked Data comprises of an unprecedented volume of structured data on the Web and is adopted from an increasing number of domains. However, the varying quality of published data forms a barrier for further adoption, especially for Linked Data consumers. In this paper, we extend a previously developed methodology of Linked Data quality assessment, which is inspired by test-driven software development. Specifically, we enrich it with ontological support and different levels of result reporting and describe how the method is applied in the Natural Language Processing (NLP) area. NLP is – compared to other domains, such as biology – a late Linked Data adopter. However, it has seen a
steep rise of activity in the creation of data and ontologies. NLP data quality assessment has become an important need for NLP datasets. In our study, we analysed 11 datasets using the lemon and NIF vocabularies in 277 test cases and point out common quality issues.
A Comparison of NER Tools w.r.t. a Domain-Specific VocabularyTimm Heuss
Presentation hold at the SEMANTiCS 2014, in regard of this paper: http://doi.acm.org/10.1145/2660517.2660520
In this paper we compare several state-of-the-art Linked Data Knowledge Extraction tools, with regard to their ability to recognise entities of a controlled, domain-specific vocabulary. This includes tools that offer APIs as a Service, locally installed platforms as well as an UIMA-based approach as reference. We evaluate under realistic conditions, with natural language source texts from keywording experts of the Städel Museum Frankfurt. The goal is to find first hints which tool approach or strategy is more convincing in case of a domain specific tagging/annotation, towards a working solution that is demanded by GLAMs world-wide.
Product Derivation is a key activity in Software Product Line Engineering. During this process, derivation operators modify or create core assets (e.g., model elements, source code instructions, components) by adding, removing or substituting them according to a given configuration. The result is a derived product that generally needs to conform to a programming or modeling language. Some operators lead to invalid products when applied to certain assets, some others do not; knowing this in advance can help to better use them, however this is challenging, specially if we consider assets expressed in extensive and complex languages such as Java. In this paper, we empirically answer the following question: which product line operators, applied to which program elements, can synthesize variants of programs that are incorrect, correct or perhaps even conforming to test suites? We implement source code transformations, based on the derivation operators of the Common Variability Language. We automatically synthesize more than 370,000 program variants from a set of 8 real large Java projects (up to 85,000 lines of code), obtaining an extensive panorama of the sanity of the operations.
Paper was presented at SPLC'15
Presentation of the main IR models
Presentation of our submission to TREC KBA 2014 (Entity oriented information retrieval), in partnership with Kware company (V. Bouvier, M. Benoit)
A talk I gave at the MMDS workshop June 2014 on the Myria system as well as some of Seung-Hee Bae's work on scalable graph clustering.
https://mmds-data.org/
Hadoop is commonly used for processing large swaths of data in batch. While many of the necessary building blocks for data processing exist within the Hadoop ecosystem – HDFS, MapReduce, HBase, Hive, Pig, Oozie, and so on – it can be a challenge to assemble and operationalize them as a production ETL platform. This presentation covers one approach to data ingest, organization, format selection, process orchestration, and external system integration, based on collective experience acquired across many production Hadoop deployments.
For many years metadata development activities have focused on developing and sharing metadata for discovering data. This is important. Once data are discovered, metadata supporting use and understanding become important. Efforts to encourage scientists and data providers to create those metadata have had limited success. This talk describes some approaches and tools for supporting the organizational change efforts required to integrate use and understanding metadata into organizational cultures. These approaches are described in terms of the ideas presented in Switch: How to Change Things When Change is Hard.
The HDF format is the foundation for sharing data in many communities that have created domain-specific conventions on top of HDF. This presentation was given at the Winter meeting of the Earth Science Information Partnership (ESIP).
Hdf Augmentation: Interoperability in the Last MileTed Habermann
Science data files are generally written to serve well-defined purposes for a small science teams. In many cases, the organization of the data and the metadata are designed for custom tools developed and maintained by and for the team. Using these data outside of this context many times involves restructuring, re-documenting, or reformatting the data. This expensive and time-consuming process usually prevents data reuse and thus decreases the total life-cycle value of the data considerably. If the data are unique or critically important to solving a particular problem, they can be modified into a more generally usable form or metadata can be added in order to enable reuse. This augmentation process can be done to enhance data for the intended purpose or for a new purpose, to make the data available to new tools and applications, to make the data more conventional or standard, or to simplify preservation of the data. The HDF Group has addressed augmentation needs in many ways: by adding extra information, by renaming objects or moving them around in the file, by reducing complexity of the organization, and sometimes by hiding data objects that are not understood by specific applications. In some cases these approaches require re-writing the data into new files and in some cases it can be done externally, without affecting the original file. We will describe and compare several examples of each approach.
ISO Metadata Improvements - Questions and AnswersTed Habermann
The ISO Standards for describing geospatial data, services, and other resources are changing. These slides describe a few of these changes in terms of documentation needs and how the new standards address these needs. I talked with these slides at a recent webinar that is available at https://www.youtube.com/watch?v=un-PtJLclIM&feature=youtu.be
Can ISO 19157 support current NASA data quality metadata?Ted Habermann
ISO 19157 provides a powerful framework for describing quality of Earth science datasets. As NASA migrates towards using that standard, it is important to understand whether and how existing data quality content fits into the ISO 19157 model. This talk demonstrates that fit and concludes that ISO 19157 can include all existing content and also includes new capabilities that can be very useful for all kinds of NASA data users.
If you are translating metadata between dialects do you know what you are losing? There is a way to identify it and quantitatively characterize lossiness of the translation.
Impact of Tool Support in Patch ConstructionDongsun Kim
Anil Koyuncu, Tegawendé F. Bissyandé, Dongsun Kim, Jacques Klein, Martin Monperrus, and Yves Le Traon, “Impact of Tool Support in Patch Construction,” in Proceedings of the 26th International Symposium on Software Testing and Analysis (ISSTA 2017), Santa Barbara, California, United States, July 10-14, 2017.
Everyone talks about "data-centricity," but what does that mean in practical terms. It means that you have to have a well defined ontology that can capture the information needed to describe the architecture or system you work with or want to create. An ontology is simply the taxonomy of entity classes (bins of information) and how those classes are related to each other. In this webinar, we will discuss a relatively new ontology, the Lifecycle Modeling Language (LML). LML provides the basis for Innoslate's database schema. In this webinar, we will discuss each entity class and why it was developed. Dr. Steven Dam, who is the Secretary of the LML Steering Committee, will present the details of the language and how it relates to other ontologies/languages, such as the DoDAF MetaModel 2.0 and SysML. He will also discuss the ways to visualize this information to enhance understanding of the information and how to use that information to make decisions about the architecture or system.
Innoslate 101: A Webinar for New Users SarahCraig7
"Innoslate 101: A Webinar for New Users." Dr. Steven Dam is going to show you just how easy it is to learn Innoslate. He will walk you through the ins and outs of the tool and show you how you can become an expert Innoslate user in no time.
What Is Covered?
- Basic Navigation and Usage
- Understanding the Different Views
- The Lifecycle of a System or Product
- Creating a requirements document
- Developing physical and functional models
- Executing functional models
- Reports and more
Innoslate is the model-based systems engineering solution of the future. An all-in-one software package made for systems engineers and program managers, you can keep your requirements management, modeling and simulation, test management, and more all in one place. Smarter, more successful systems start here. Create a trial account at innoslate.com/signup.
Slides for the following paper: NLP Data Cleansing Based on Linguistic Ontology Constraints
Abstract: Linked Data comprises of an unprecedented volume of structured data on the Web and is adopted from an increasing number of domains. However, the varying quality of published data forms a barrier for further adoption, especially for Linked Data consumers. In this paper, we extend a previously developed methodology of Linked Data quality assessment, which is inspired by test-driven software development. Specifically, we enrich it with ontological support and different levels of result reporting and describe how the method is applied in the Natural Language Processing (NLP) area. NLP is – compared to other domains, such as biology – a late Linked Data adopter. However, it has seen a
steep rise of activity in the creation of data and ontologies. NLP data quality assessment has become an important need for NLP datasets. In our study, we analysed 11 datasets using the lemon and NIF vocabularies in 277 test cases and point out common quality issues.
A Comparison of NER Tools w.r.t. a Domain-Specific VocabularyTimm Heuss
Presentation hold at the SEMANTiCS 2014, in regard of this paper: http://doi.acm.org/10.1145/2660517.2660520
In this paper we compare several state-of-the-art Linked Data Knowledge Extraction tools, with regard to their ability to recognise entities of a controlled, domain-specific vocabulary. This includes tools that offer APIs as a Service, locally installed platforms as well as an UIMA-based approach as reference. We evaluate under realistic conditions, with natural language source texts from keywording experts of the Städel Museum Frankfurt. The goal is to find first hints which tool approach or strategy is more convincing in case of a domain specific tagging/annotation, towards a working solution that is demanded by GLAMs world-wide.
Product Derivation is a key activity in Software Product Line Engineering. During this process, derivation operators modify or create core assets (e.g., model elements, source code instructions, components) by adding, removing or substituting them according to a given configuration. The result is a derived product that generally needs to conform to a programming or modeling language. Some operators lead to invalid products when applied to certain assets, some others do not; knowing this in advance can help to better use them, however this is challenging, specially if we consider assets expressed in extensive and complex languages such as Java. In this paper, we empirically answer the following question: which product line operators, applied to which program elements, can synthesize variants of programs that are incorrect, correct or perhaps even conforming to test suites? We implement source code transformations, based on the derivation operators of the Common Variability Language. We automatically synthesize more than 370,000 program variants from a set of 8 real large Java projects (up to 85,000 lines of code), obtaining an extensive panorama of the sanity of the operations.
Paper was presented at SPLC'15
Presentation of the main IR models
Presentation of our submission to TREC KBA 2014 (Entity oriented information retrieval), in partnership with Kware company (V. Bouvier, M. Benoit)
A talk I gave at the MMDS workshop June 2014 on the Myria system as well as some of Seung-Hee Bae's work on scalable graph clustering.
https://mmds-data.org/
Hadoop is commonly used for processing large swaths of data in batch. While many of the necessary building blocks for data processing exist within the Hadoop ecosystem – HDFS, MapReduce, HBase, Hive, Pig, Oozie, and so on – it can be a challenge to assemble and operationalize them as a production ETL platform. This presentation covers one approach to data ingest, organization, format selection, process orchestration, and external system integration, based on collective experience acquired across many production Hadoop deployments.
Hadoop is commonly used for processing large swaths of data in batch. While many of the necessary building blocks for data processing exist within the Hadoop ecosystem – HDFS, MapReduce, HBase, Hive, Pig, Oozie, and so on – it can be a challenge to assemble and operationalize them as a production ETL platform. This presentation covers one approach to data ingest, organization, format selection, process orchestration, and external system integration, based on collective experience acquired across many production Hadoop deployments.
Text as Data: processing the Hebrew BibleDirk Roorda
The merits of stand-off markup (LAF) versus inline markup (TEI) for processing text as data. Ideas applied to work with the Hebrew Bible, resulting in tools for researchers and end-users.
What is greenhouse gasses and how many gasses are there to affect the Earth.moosaasad1975
What are greenhouse gasses how they affect the earth and its environment what is the future of the environment and earth how the weather and the climate effects.
Nutraceutical market, scope and growth: Herbal drug technologyLokesh Patil
As consumer awareness of health and wellness rises, the nutraceutical market—which includes goods like functional meals, drinks, and dietary supplements that provide health advantages beyond basic nutrition—is growing significantly. As healthcare expenses rise, the population ages, and people want natural and preventative health solutions more and more, this industry is increasing quickly. Further driving market expansion are product formulation innovations and the use of cutting-edge technology for customized nutrition. With its worldwide reach, the nutraceutical industry is expected to keep growing and provide significant chances for research and investment in a number of categories, including vitamins, minerals, probiotics, and herbal supplements.
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...Studia Poinsotiana
I Introduction
II Subalternation and Theology
III Theology and Dogmatic Declarations
IV The Mixed Principles of Theology
V Virtual Revelation: The Unity of Theology
VI Theology as a Natural Science
VII Theology’s Certitude
VIII Conclusion
Notes
Bibliography
All the contents are fully attributable to the author, Doctor Victor Salas. Should you wish to get this text republished, get in touch with the author or the editorial committee of the Studia Poinsotiana. Insofar as possible, we will be happy to broker your contact.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.Sérgio Sacani
The return of a sample of near-surface atmosphere from Mars would facilitate answers to several first-order science questions surrounding the formation and evolution of the planet. One of the important aspects of terrestrial planet formation in general is the role that primary atmospheres played in influencing the chemistry and structure of the planets and their antecedents. Studies of the martian atmosphere can be used to investigate the role of a primary atmosphere in its history. Atmosphere samples would also inform our understanding of the near-surface chemistry of the planet, and ultimately the prospects for life. High-precision isotopic analyses of constituent gases are needed to address these questions, requiring that the analyses are made on returned samples rather than in situ.
The ability to recreate computational results with minimal effort and actionable metrics provides a solid foundation for scientific research and software development. When people can replicate an analysis at the touch of a button using open-source software, open data, and methods to assess and compare proposals, it significantly eases verification of results, engagement with a diverse range of contributors, and progress. However, we have yet to fully achieve this; there are still many sociotechnical frictions.
Inspired by David Donoho's vision, this talk aims to revisit the three crucial pillars of frictionless reproducibility (data sharing, code sharing, and competitive challenges) with the perspective of deep software variability.
Our observation is that multiple layers — hardware, operating systems, third-party libraries, software versions, input data, compile-time options, and parameters — are subject to variability that exacerbates frictions but is also essential for achieving robust, generalizable results and fostering innovation. I will first review the literature, providing evidence of how the complex variability interactions across these layers affect qualitative and quantitative software properties, thereby complicating the reproduction and replication of scientific studies in various fields.
I will then present some software engineering and AI techniques that can support the strategic exploration of variability spaces. These include the use of abstractions and models (e.g., feature models), sampling strategies (e.g., uniform, random), cost-effective measurements (e.g., incremental build of software configurations), and dimensionality reduction methods (e.g., transfer learning, feature selection, software debloating).
I will finally argue that deep variability is both the problem and solution of frictionless reproducibility, calling the software science community to develop new methods and tools to manage variability and foster reproducibility in software systems.
Exposé invité Journées Nationales du GDR GPL 2024
This presentation explores a brief idea about the structural and functional attributes of nucleotides, the structure and function of genetic materials along with the impact of UV rays and pH upon them.
Professional air quality monitoring systems provide immediate, on-site data for analysis, compliance, and decision-making.
Monitor common gases, weather parameters, particulates.
1. Translation Proofing – Quantitative Tools for
Connecting Metadata Dialects
Ted Habermann
Director of Earth Science
The HDF Group
thabermann@hdfgroup.org
1
2. Metadata in Multiple Dialects
Documentation
Repository
ISO 19115,
19115-2, 19119
and extensions
THREDDS
HDF, netCDF
(NcML)
FGDC,
Data.Gov
SensorML
WCS, WMS,
WFS, SOS
Open
Provenance
Model, PROV
DIF, ECS,
ECHO
KML
3. Translation Lossiness
Documentation dialects generally have significant overlap because the
concepts that are being documented (who, where, what, when, and why?)
are shared cross many communities and dialects.
At the same time, there are differences…
A B AB
More Lossy Less Lossy
We are familiar with the idea of lossiness with data compression. How can we
quantify the lossiness of a translation?
4. Characterizing the Source
The distribution of elements in any metadata collection reflects the requirements
of the data providers and users. Some elements are more common (important?)
than others.
This heterogeneity needs to be considered when evaluating the translation.
448 CSDGM Records
161,151 Elements and Attributes
10,713 Place Keywords
1 /metadata/USGSErp/MetadataNotes
264 elements occur < 100 times
5. Lossiness = Distribution + Crosswalk
+
Actual Distribution (collection & community) Reference Crosswalk
In order to calculate the lossiness of a translation we need the actual distribution
of elements in the source and a reference crosswalk that gives the destinations
that the source elements are mapped to.
Source Destination
6. Three Examples
January 8-10, 2014 ESIP Winter 2014 6
Element # % Translated? % Translated
A 134 66% 1 66%
B 50 25% 1 25%
C 20 10% 1 10%
204 1 100%
Element A occurs 134 times and makes up 66% of the source
Element B occurs 50 times and makes up 25% of the source
Element C occurs 20 times and makes up 10% of the source
Element # % Translated? % Translated
A 134 66% 1 66%
B 50 25% 0 0%
C 20 10% 1 10%
204 1 75%
Element # % Translated? % Translated
A 134 66% 1 66%
B 50 25% 1 25%
C 20 10% 0 0%
204 1 91%
100% elements translated: lossiness = 0%
75% elements translated: lossiness = 25%
91% elements translated: lossiness = 9%
7. Calculating Lossiness
+
Number of Occurrences
Total Number of Elements
*
1 if in crosswalk
0 if not
n = 1
number of
elements
=Lossiness
Actual Distribution (collection & community) Reference Crosswalk
1-
Source Destination
9. Acknowledgements
This work was partially supported by contract number NNG10HP02C from NASA.
Any opinions, findings, conclusions, or recommendations expressed in this material are
those of the author and do not necessarily reflect the views of NASA or The HDF Group.