The document discusses the importance of physical samples in scientific research and the problems associated with poor sample tracking and metadata. It promotes the use of the International GeoSample Number (IGSN) and the System for Earth Sample Registration (SESAR) to address these issues. The IGSN provides a globally unique and persistent identifier for samples, allowing their discovery and linking to related data and publications. SESAR is a database that catalogs sample metadata and assigns IGSNs. Adopting these systems can help make samples and their data FAIR by improving access, interoperability, and citation of samples.
USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 3Gianpaolo Coro
An e-Infrastructure is a distributed network of service nodes, residing on multiple sites and managed by one or more organizations. e-Infrastructures allow scientists residing at distant places to collaborate. They offer a multiplicity of facilities as-a-service, supporting data sharing and usage at different levels of abstraction, e.g. data transfer, data harmonization, data processing workflows etc. e-Infrastructures are gaining an important place in the field of biodiversity conservation. Their computational capabilities help scientists to reuse models, obtain results in shorter time and share these results with other colleagues. They are also used to access several and heterogeneous biodiversity catalogues.
In this course, the D4Science e-Infrastructure will be used to conduct experiments in the field of biodiversity conservation. D4Science hosts models and contributions by several international organizations involved in the biodiversity conservation field. The course will give students an overview of the models, the practices and the methods that large international organizations like FAO and UNESCO apply by means of D4Science. At the same time, the course will introduce students to the basic concepts under e-Infrastructures, Virtual Research Environments, data sharing and experiments reproducibility.
Data Facilities Workshop - Panel on Current Concepts in Data Sharing & Intero...EarthCube
This series of presentations was given at the EarthCube Data Facilities End-User Workshop held January 15-17, 2014 in Washington, DC. This workshop provided a forum to discuss the unique requirements and challenges associated with developing the communication, collaboration, interoperability, and governance structures that will be required to build EarthCube in conjunction with existing and emerging NSF/GEO facilities.
This panel and discussion, specifically, outlined and explained several current concepts in data sharing and interoperability, featuring presentations by:
Paul Morin (UMN): Polar Cyberinfrastructure
Don Middleton (UCAR): Atmospheric/Climate
Kerstin Lehnert (LDEO): Domain Repositories & Physical Samples
David Schindel (CBOL, GRBio): Biological Perspective & Collections
Hank Leoscher (NEON): Observation Networks
Daniel Fuka (Virginia Tech) and Ruth Duerr (NSIDC): Brokering
Ilya Zaslavsky (UCSD): Cross-Domain Interoperability
IGSN: The International Geo Sample Number (DFG Roundtable)Kerstin Lehnert
This presentation provides an overview of the rationale for the IGSN, of the organizational structure and architecture of the IGSN e.V. , and the System for Earth Sample Registration.
Dealing with heterogeneous data to improve our knowledge of biodiversity dynamics and ecosystem function: perspectives from synthesis projects: presented by Orlane Anneville for GEISHA (Global evaluation of the impacts of storms on freshwater habitat and structure of phytoplankton assemblages) at the sfécologie conference 2018.
for more information on the group: http://www.cesab.org/index.php/fr/projets-en-cours/projets-2015/138-geisha
Text (personal views position statement) to accompany presentation on what research infrastructures really need for data, XLDB-Europe, 8-10th June 2011, Edinburgh
USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 5Gianpaolo Coro
An e-Infrastructure is a distributed network of service nodes, residing on multiple sites and managed by one or more organizations. e-Infrastructures allow scientists residing at distant places to collaborate. They offer a multiplicity of facilities as-a-service, supporting data sharing and usage at different levels of abstraction, e.g. data transfer, data harmonization, data processing workflows etc. e-Infrastructures are gaining an important place in the field of biodiversity conservation. Their computational capabilities help scientists to reuse models, obtain results in shorter time and share these results with other colleagues. They are also used to access several and heterogeneous biodiversity catalogues.
In this course, the D4Science e-Infrastructure will be used to conduct experiments in the field of biodiversity conservation. D4Science hosts models and contributions by several international organizations involved in the biodiversity conservation field. The course will give students an overview of the models, the practices and the methods that large international organizations like FAO and UNESCO apply by means of D4Science. At the same time, the course will introduce students to the basic concepts under e-Infrastructures, Virtual Research Environments, data sharing and experiments reproducibility.
USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 3Gianpaolo Coro
An e-Infrastructure is a distributed network of service nodes, residing on multiple sites and managed by one or more organizations. e-Infrastructures allow scientists residing at distant places to collaborate. They offer a multiplicity of facilities as-a-service, supporting data sharing and usage at different levels of abstraction, e.g. data transfer, data harmonization, data processing workflows etc. e-Infrastructures are gaining an important place in the field of biodiversity conservation. Their computational capabilities help scientists to reuse models, obtain results in shorter time and share these results with other colleagues. They are also used to access several and heterogeneous biodiversity catalogues.
In this course, the D4Science e-Infrastructure will be used to conduct experiments in the field of biodiversity conservation. D4Science hosts models and contributions by several international organizations involved in the biodiversity conservation field. The course will give students an overview of the models, the practices and the methods that large international organizations like FAO and UNESCO apply by means of D4Science. At the same time, the course will introduce students to the basic concepts under e-Infrastructures, Virtual Research Environments, data sharing and experiments reproducibility.
Data Facilities Workshop - Panel on Current Concepts in Data Sharing & Intero...EarthCube
This series of presentations was given at the EarthCube Data Facilities End-User Workshop held January 15-17, 2014 in Washington, DC. This workshop provided a forum to discuss the unique requirements and challenges associated with developing the communication, collaboration, interoperability, and governance structures that will be required to build EarthCube in conjunction with existing and emerging NSF/GEO facilities.
This panel and discussion, specifically, outlined and explained several current concepts in data sharing and interoperability, featuring presentations by:
Paul Morin (UMN): Polar Cyberinfrastructure
Don Middleton (UCAR): Atmospheric/Climate
Kerstin Lehnert (LDEO): Domain Repositories & Physical Samples
David Schindel (CBOL, GRBio): Biological Perspective & Collections
Hank Leoscher (NEON): Observation Networks
Daniel Fuka (Virginia Tech) and Ruth Duerr (NSIDC): Brokering
Ilya Zaslavsky (UCSD): Cross-Domain Interoperability
IGSN: The International Geo Sample Number (DFG Roundtable)Kerstin Lehnert
This presentation provides an overview of the rationale for the IGSN, of the organizational structure and architecture of the IGSN e.V. , and the System for Earth Sample Registration.
Dealing with heterogeneous data to improve our knowledge of biodiversity dynamics and ecosystem function: perspectives from synthesis projects: presented by Orlane Anneville for GEISHA (Global evaluation of the impacts of storms on freshwater habitat and structure of phytoplankton assemblages) at the sfécologie conference 2018.
for more information on the group: http://www.cesab.org/index.php/fr/projets-en-cours/projets-2015/138-geisha
Text (personal views position statement) to accompany presentation on what research infrastructures really need for data, XLDB-Europe, 8-10th June 2011, Edinburgh
USING E-INFRASTRUCTURES FOR BIODIVERSITY CONSERVATION - Module 5Gianpaolo Coro
An e-Infrastructure is a distributed network of service nodes, residing on multiple sites and managed by one or more organizations. e-Infrastructures allow scientists residing at distant places to collaborate. They offer a multiplicity of facilities as-a-service, supporting data sharing and usage at different levels of abstraction, e.g. data transfer, data harmonization, data processing workflows etc. e-Infrastructures are gaining an important place in the field of biodiversity conservation. Their computational capabilities help scientists to reuse models, obtain results in shorter time and share these results with other colleagues. They are also used to access several and heterogeneous biodiversity catalogues.
In this course, the D4Science e-Infrastructure will be used to conduct experiments in the field of biodiversity conservation. D4Science hosts models and contributions by several international organizations involved in the biodiversity conservation field. The course will give students an overview of the models, the practices and the methods that large international organizations like FAO and UNESCO apply by means of D4Science. At the same time, the course will introduce students to the basic concepts under e-Infrastructures, Virtual Research Environments, data sharing and experiments reproducibility.
Advancing Reproducible Science from Physical Samples: The IGSN and the iSampl...Kerstin Lehnert
Presentation at the Geological Society of America (GSA) meeting 2016 in the session on FOSSIL SPECIMENS 0'S AND 1'S: DATABASES, STANDARDS, & MOBILIZATION
Making Small Data BIG (UT Austin, March 2016)Kerstin Lehnert
Presentation given at the Texas Advanced Computing Center. It describes the potential of re-using small data for new science, achievements and the challenges to make small data re-usable.
Presentation about the IGSN and ongoing initiatives for the Internet of Samples at the EGU 2015 short course "Open Science Goes Geo: Beyond Data and Software".
Scratchpads: Building web communities supporting biodiversity scienceVince Smith
Presented by Dave Roberts at a meeting titled "Information Technology in Biodiversity Conservation and in Agriculture" organized by the Club of Rome and the EU ICT-ENSURE project, at UNESCO, Paris. January 15th, 2009.
Biodiversity Informatics: An Interdisciplinary ChallengeBryan Heidorn
"Impacto de la Informática en el Conocimiento de la Biodiversidad: Actualidad y Futuro” at Universidad Nacional de Colombia on August 12, 2011. https://sites.google.com/site/simposioinformaticaicn/home
TraitBank is the structured data service of the Encyclopedia
of Life. Launched in 2014, it currently hosts 9 million
data records for 1.7 million taxa, including trait records
(eg: cell size, life history traits) and other attributes including
administrative ones (eg: IUCN status, type specimen
repository). Marine datasets include verbal localities
from WoRMS, habitat categories from AlgaeBase, water
temperature ranges based on known occurrence records
from OBIS, and literature derived datasets including cell
masses of phytoplankton and tissue mineralization types
of algae and invertebrates. Hosted records include all
available metadata, including detailed attribution, url of
data source if online; organism information including sex
and life stage; date, locality and method information for
field studies, and any other fields provided by the source.
TraitBank is not a repository. Most hosted records are
deposited with a scholarly publication, or an institutional
or aggregator database. Presence in TraitBank makes
individual records findable by EOL search (http://eol.org/
data_search) or web search engine. Search results on EOL
are available by CSV download and records are available
to semantic web applications via a JSON-LD web service,
including all metadata. Fresh Data is a data search service
in development primarily for the Citizen Science community,
funded by NSF. Interested occurrence data providers
will register to be indexed. Their data will be deposited at
GBIF, using the IPT, if possible, and in TraitBank otherwise
(eg: presence/absence or abundance data, if GBIF
cannot accommodate them). Searchers can query the
index for recent records by time, location and taxonomic
group. Registered researchers will also be able to save and
publish their data queries, which will alert them if new
data appears matching their criteria, and alert the data
provider that their data was delivered to a subscriber.
Research Data Infrastructure for Geochemistry (DFG Roundtable)Kerstin Lehnert
This presentation provides an overview of different aspects of data management for geochemistry and resources available at the EarthChem@IEDA data facility.
Research infrastructures: the case for integrating freshwater biodiversity dataAaike De Wever
Achievements of the EU BioFresh project on freshwater biodiversity. Importance to keep mobilising data and continue the work on the web infrastructure.
A Model to Represent Nomenclatural and Taxonomic Information as Linked Data. ...Franck Michel
Presentation of an article published at the 2nd International Workshop on Semantics for Biodiversity (S4Biodiv 2017), co-located with ISWC2017.
Article: https://hal.archives-ouvertes.fr/hal-01617708
Taxonomic registers are key tools to help us comprehend the diversity of nature. Publishing such registers in the Web of Data, following the standards and best practices of Linked Open Data (LOD), is a way of integrating multiple data sources into a world-scale, biological knowledge base. In this pa-per, we present an on-going work aimed at the publication of TAXREF, the French national taxonomic register, on the Web of Data. Far beyond the mere translation of the TAXREF database into LOD standards, we show that the key point of this endeavor is the design of a model capable of capturing the two coexisting yet distinct realities underlying taxonomic registers, namely the nomenclature (the rules for naming biological entities) and the taxonomy (the description and characterization of these biological entities). We first analyze different modelling choices made to represent some international taxonomic registers as LOD, and we underline the issues that arise from these differences. Then, we propose a model aimed to tackle these is-sues. This model separates nomenclature from taxonomy, it is flexible enough to accommodate the ever-changing scientific consensus on taxonomy, and it adheres to the philosophy underpinning the Semantic Web standards. Finally, using the example of TAXREF, we show that the model enables interlinking with third-party LOD data sets, may they represent nomenclatural or taxonomic information.
Understanding the Big Picture of e-ScienceAndrew Sallans
A. Sallans. "Understanding the Big Picture of e-Science." Presented at the 2011 eScience Bootcamp at the University of Virginia's Claude Moore Health Sciences Library. 4 March 2011
Lecture for a course at NTNU, 27th January 2021
CC-BY 4.0 Dag Endresen https://orcid.org/0000-0002-2352-5497
See also http://bit.ly/biodiversityinformatics
https://www.gbif.no/events/2021/lecture-ntnu-gbif.html
International Conference on Integrative Biology Summit, will be organized around the theme "Accelerating Computational Approaches to Biological Research."
Using agent-based simulation for socio-ecological uncertainty analysisBruce Edmonds
A talk given in the MMU Big Data Centrem, 30th October 2018.
Both social and ecological systems can be highly complex, but the interaction between these two worlds - a socio-ecological system (SES) - can add even greater levels. However, the maintenance of SES are vital to our well being and the health of the planet. We do not know how such systems work in practice and we lack good data about them (especially the ecological side) so predicting the effect of any particular policy is infeasible. Here we present an approach which tries to understand some of the ways in which SES may go wrong, but constructing different complex simulation models and analysing the emergent outcomes. These, in silico, examples can allow for the institution of targeted data gathering instruments that give the earliest possible warning of deleterious outcomes, and thus allow for timely remedial responses. An example of this approach applied to fisheries is described.
Digital Representation of Physical Samples in Scientific PublicationsKerstin Lehnert
Presentation about the digital representation of physical samples in scientific publications, given at the European Geoscience Union meeting 2015 in the Splinter Meeting 1.36 "Digital Representation of Physical Samples in Scientific Publications".
Identifying and Linking Physical Samples with Data: Using IGSNARDC
7 June 2017
This webinar is the second in a series examining persistent identifiers and their use in research. This webinar:
It introduced the IGSN, outlining its structure, use, application and availability for Australian researchers and research institutions
discussed the international symposium Linking Environmental Data and Samples.
Watch full webinar: https://www.youtube.com/watch?v=mOJRaLwOaCs
Advancing Reproducible Science from Physical Samples: The IGSN and the iSampl...Kerstin Lehnert
Presentation at the Geological Society of America (GSA) meeting 2016 in the session on FOSSIL SPECIMENS 0'S AND 1'S: DATABASES, STANDARDS, & MOBILIZATION
Making Small Data BIG (UT Austin, March 2016)Kerstin Lehnert
Presentation given at the Texas Advanced Computing Center. It describes the potential of re-using small data for new science, achievements and the challenges to make small data re-usable.
Presentation about the IGSN and ongoing initiatives for the Internet of Samples at the EGU 2015 short course "Open Science Goes Geo: Beyond Data and Software".
Scratchpads: Building web communities supporting biodiversity scienceVince Smith
Presented by Dave Roberts at a meeting titled "Information Technology in Biodiversity Conservation and in Agriculture" organized by the Club of Rome and the EU ICT-ENSURE project, at UNESCO, Paris. January 15th, 2009.
Biodiversity Informatics: An Interdisciplinary ChallengeBryan Heidorn
"Impacto de la Informática en el Conocimiento de la Biodiversidad: Actualidad y Futuro” at Universidad Nacional de Colombia on August 12, 2011. https://sites.google.com/site/simposioinformaticaicn/home
TraitBank is the structured data service of the Encyclopedia
of Life. Launched in 2014, it currently hosts 9 million
data records for 1.7 million taxa, including trait records
(eg: cell size, life history traits) and other attributes including
administrative ones (eg: IUCN status, type specimen
repository). Marine datasets include verbal localities
from WoRMS, habitat categories from AlgaeBase, water
temperature ranges based on known occurrence records
from OBIS, and literature derived datasets including cell
masses of phytoplankton and tissue mineralization types
of algae and invertebrates. Hosted records include all
available metadata, including detailed attribution, url of
data source if online; organism information including sex
and life stage; date, locality and method information for
field studies, and any other fields provided by the source.
TraitBank is not a repository. Most hosted records are
deposited with a scholarly publication, or an institutional
or aggregator database. Presence in TraitBank makes
individual records findable by EOL search (http://eol.org/
data_search) or web search engine. Search results on EOL
are available by CSV download and records are available
to semantic web applications via a JSON-LD web service,
including all metadata. Fresh Data is a data search service
in development primarily for the Citizen Science community,
funded by NSF. Interested occurrence data providers
will register to be indexed. Their data will be deposited at
GBIF, using the IPT, if possible, and in TraitBank otherwise
(eg: presence/absence or abundance data, if GBIF
cannot accommodate them). Searchers can query the
index for recent records by time, location and taxonomic
group. Registered researchers will also be able to save and
publish their data queries, which will alert them if new
data appears matching their criteria, and alert the data
provider that their data was delivered to a subscriber.
Research Data Infrastructure for Geochemistry (DFG Roundtable)Kerstin Lehnert
This presentation provides an overview of different aspects of data management for geochemistry and resources available at the EarthChem@IEDA data facility.
Research infrastructures: the case for integrating freshwater biodiversity dataAaike De Wever
Achievements of the EU BioFresh project on freshwater biodiversity. Importance to keep mobilising data and continue the work on the web infrastructure.
A Model to Represent Nomenclatural and Taxonomic Information as Linked Data. ...Franck Michel
Presentation of an article published at the 2nd International Workshop on Semantics for Biodiversity (S4Biodiv 2017), co-located with ISWC2017.
Article: https://hal.archives-ouvertes.fr/hal-01617708
Taxonomic registers are key tools to help us comprehend the diversity of nature. Publishing such registers in the Web of Data, following the standards and best practices of Linked Open Data (LOD), is a way of integrating multiple data sources into a world-scale, biological knowledge base. In this pa-per, we present an on-going work aimed at the publication of TAXREF, the French national taxonomic register, on the Web of Data. Far beyond the mere translation of the TAXREF database into LOD standards, we show that the key point of this endeavor is the design of a model capable of capturing the two coexisting yet distinct realities underlying taxonomic registers, namely the nomenclature (the rules for naming biological entities) and the taxonomy (the description and characterization of these biological entities). We first analyze different modelling choices made to represent some international taxonomic registers as LOD, and we underline the issues that arise from these differences. Then, we propose a model aimed to tackle these is-sues. This model separates nomenclature from taxonomy, it is flexible enough to accommodate the ever-changing scientific consensus on taxonomy, and it adheres to the philosophy underpinning the Semantic Web standards. Finally, using the example of TAXREF, we show that the model enables interlinking with third-party LOD data sets, may they represent nomenclatural or taxonomic information.
Understanding the Big Picture of e-ScienceAndrew Sallans
A. Sallans. "Understanding the Big Picture of e-Science." Presented at the 2011 eScience Bootcamp at the University of Virginia's Claude Moore Health Sciences Library. 4 March 2011
Lecture for a course at NTNU, 27th January 2021
CC-BY 4.0 Dag Endresen https://orcid.org/0000-0002-2352-5497
See also http://bit.ly/biodiversityinformatics
https://www.gbif.no/events/2021/lecture-ntnu-gbif.html
International Conference on Integrative Biology Summit, will be organized around the theme "Accelerating Computational Approaches to Biological Research."
Using agent-based simulation for socio-ecological uncertainty analysisBruce Edmonds
A talk given in the MMU Big Data Centrem, 30th October 2018.
Both social and ecological systems can be highly complex, but the interaction between these two worlds - a socio-ecological system (SES) - can add even greater levels. However, the maintenance of SES are vital to our well being and the health of the planet. We do not know how such systems work in practice and we lack good data about them (especially the ecological side) so predicting the effect of any particular policy is infeasible. Here we present an approach which tries to understand some of the ways in which SES may go wrong, but constructing different complex simulation models and analysing the emergent outcomes. These, in silico, examples can allow for the institution of targeted data gathering instruments that give the earliest possible warning of deleterious outcomes, and thus allow for timely remedial responses. An example of this approach applied to fisheries is described.
Digital Representation of Physical Samples in Scientific PublicationsKerstin Lehnert
Presentation about the digital representation of physical samples in scientific publications, given at the European Geoscience Union meeting 2015 in the Splinter Meeting 1.36 "Digital Representation of Physical Samples in Scientific Publications".
Identifying and Linking Physical Samples with Data: Using IGSNARDC
7 June 2017
This webinar is the second in a series examining persistent identifiers and their use in research. This webinar:
It introduced the IGSN, outlining its structure, use, application and availability for Australian researchers and research institutions
discussed the international symposium Linking Environmental Data and Samples.
Watch full webinar: https://www.youtube.com/watch?v=mOJRaLwOaCs
iSamples Research Coordination Network (C4P Webinar)Kerstin Lehnert
The iSamples (Internet of Samples in the Earth Sciences) Research Coordination Network is part of EarthCube and focuses on the integration of physical samples and collections into digital data infrastructure in the Earth sciences. This presentation summarizes the activities of the iSamples RCN and presents results from a major community survey about sharing and management of physical samples that was conducted as part of the RCN.
Research data management: a tale of two paradigms: Martin Donnelly
Presentation I was supposed to give at "Scotland’s Collections and the Digital Humanities" workshop in Edinburgh on May 2nd 2014. Illness prevented it, but my heroic DCC colleague Jonathan Rans stepped up and delivered the presentation on my behalf.
Research Data Management: A Tale of Two Paradigmstarastar
Presentation by Martin Donnelly, Digital Curation Centre, University of Edinburgh. Invited talk at a workshop for 'Scotland's National Collections and the Digital Humanities,' a knowledge-exchange project hosted at the University of Edinburgh. 2 May 2014. http://www.blogs.hss.ed.ac.uk/archives-now/
RDA Fourth Plenary Keynote - Prof. Christine L. Borgman, Professor Presidential Chair in Information Studies at UCLA: "Data, Data, Everywhere, Nor Any Drop to Drink." Tuesday 23rd Sept 2014, Amsterdam, the Netherlands
https://rd-alliance.org/plenary-meetings/fourth-plenary/plenary4-programme.html
Scott Edmunds talk at AIST: Overcoming the Reproducibility Crisis: and why I ...GigaScience, BGI Hong Kong
Scott Edmunds talk at the AIST Computational Biology Research Center in Tokyo: Overcoming the Reproducibility Crisis: and why I stopped worrying a learned to love open data (& methods), July 1st 2014
Birgit Schmidt: RDA for Libraries from an International Perspectivedri_ireland
From "A National Approach to Open Research Data in Ireland", a workshop held on 8 September 2017 in National Library of Ireland, organised by The National Library of Ireland, the Digital Repository of Ireland, the Research Data Alliance and Open Research Ireland.
Data Management in the context of Open Science.
Because open access become mandatory for publications and project-funded research data, it is the responsibility of each researcher to be informed and then trained in new practices.
Open Data in a Big Data World: easy to say, but hard to do?LEARN Project
Presentation at 3rd LEARN workshop on Research Data Management, “Make research data management policies work”
Helsinki, 28 June 2016, by Sarah Callaghan, STFC Rutherford Appleton Laboratory
Introduction to research data management; Lecture 01 for GRAD521Amanda Whitmire
Lesson 1: Introduction to research data management. From a series of lectures from a 10-week, 2-credit graduate-level course in research data management (GRAD521, offered at Oregon State University).
The course description is: "Careful examination of all aspects of research data management best practices. Designed to prepare students to exceed funder mandates for performance in data planning, documentation, preservation and sharing in an increasingly complex digital research environment. Open to students of all disciplines."
Major course content includes: Overview of research data management, definitions and best practices; Types, formats and stages of research data; Metadata (data documentation); Data storage, backup and security; Legal and ethical considerations of research data; Data sharing and reuse; Archiving and preservation.
See also, "Whitmire, Amanda (2014): GRAD 521 Research Data Management Lectures. figshare. http://dx.doi.org/10.6084/m9.figshare.1003835. Retrieved 23:25, Jan 07, 2015 (GMT)"
Participatory Research: Extending Open Science beyond the ivory tower - Open ...Claudia Göbel
Participatory research, that is when people who are not employed as scientists do research – on their own, in groups and potentially in cooperation with people who are employed as scientists. There are many different methodologies and approaches to participatory research, for instance DIY science, Community-based Research, Participatory Action Research and Citizen Science.
Such concepts are not often considered in discussions on Open Science, which are usually focussing on developments within scientific institutions. In my talk I will argue that it is important to extend the idea of Open Science beyond academia by considering another shade of openness – openness for participation of volunteers and cooperations with civil society organisations.
With a focus on Citizen Science, I will introduce a pluralistic concept of participatory research, highlight current developments and discuss challenges of this growing field of practice in Europe.
This slide deck provides an update on the development of the Astromaterials Data System, a project funded by NASA to ensure the long-term accessibility and utility of lab analytical data acquired on astromaterials samples curated at the Johnson Space Center, including samples collected on the moon during the Apollo missions and meteorites collected in Antarctica.
Presentation about geochemical research data access and publication provided to the Australian Geochemistry Network by Kerstin Lehnert of EarthChem and the Astromaterials Data System
Boosting Data Science in Geochemistry: We Need Global Geochemical Data Standa...Kerstin Lehnert
Presentation at AGU Fall Meeting 2018: Large-scale, global geochemical data syntheses like EarthChem and GEOROC have, for nearly two decades, inspired and made possible a vast range of scientific studies and new discoveries, facilitating the analysis and mining of geochemical data and creating new paradigms in geochemical data analysis such as statistical geochemistry. These syntheses provide easy access to fully integrated compilations of thousands of datasets (‘data fusion’) with millions of geochemical measurements that are accompanied by comprehensive and harmonized metadata for context and provenance to search, filter, sort, and evaluate the data.
The syntheses have been assembled and maintained through manual labor by data managers, who extract data and metadata from text, tables, and supplements of publications for inclusion in the databases, a time-consuming task due to the multitude of data formats, units, normalizations, vocabularies, etc., i.e. lack of best practices for geochemical data reporting. In order to support and advance future science endeavors that rely on access to and analysis of large volumes of geochemical data, we need to develop and implement global standards for geochemical data that not only make geochemical data FAIR (Findable, Accessible, Interoperable, Re-usable), but ready for data fusion. As more geochemical data systems are emerging at national, programmatic, and subdomain levels in response to Open Access policies and science needs, standard protocols for exchanging geochemical data among these systems will need to be developed, implemented, and governed.
Critical is the alignment with existing standards such as the Semantic Sensor Network (SSN) ontology, a recent joint W3C and OGC standard that standardizes description of sensors, observation, sampling, and actuation, with sufficient flexibility to allow details of these elements to be defined in different domains. New initiatives within the International Council for Science and CODATA are working towards coordinating the International Science Unions to identify and endorse the more authoritative standards (including vocabularies and ontologies). These initiatives present a timely opportunity for geochemical data to ensure that they are born ‘connected’ within and across disciplines.
Looking at the past of infrastructure development for research data in the context of infrastructure development patterns and experiences from the evolution of the IEDA data facility to inform future pathways and developments. A major focus of the lecture is on the FAIR principles and the issues surrounding reusability of data.
Presentation that describes the experiences and insights of the IEDA data facility gained during the >10 years of building cyberinfrastructure for a long-tail community geochemistry
Interdisciplinary Data Resources for Volcanology at the IEDA (Interdisciplina...Kerstin Lehnert
Presentation given at the EGU 2015 General Assembly in session "Methods for Understanding Volcanic Hazards and Risks" (NH2.2), describing EarthChem data systems that make accessible and synthesize geochemical data of volcanic rocks and gases, and the System for Earth Sample Registration that catalogs sample metadata and provides persistent unique sample identifiers (International Geo Sample Number IGSN). It also mentions EarthChem's plans and ongoing work to link geochemical data with other volcanological databases, and the IEDA data rescue initiative.
Lehnert: Making Small Data Big, IACS, April2015Kerstin Lehnert
Seminar presentation at the Institute for Advanced Computational Science at Stony Brook University, April 9, 2015, describing achievements and challenges of data infrastructure in a long-tail science domain with the example of geochemistry.
MoonDB: Restoration & Synthesis of Planetary Geochemical DataKerstin Lehnert
This presentation explains the MoonDB project that will restore and synthesize geochemical and petrological data acquired on lunar samples over more than 4 decades. The project is a collaboration between the IEDA data facility (http://www.iedadata.org) at the Lamont-Doherty Earth Observatory of Columbia University and the Astromaterials Acquisition and Curation Office (AACO) at Johnson Space Center (JSC).
This presentation was part of a workshop of IEDA (http://www.iedadata.org) at the AGU (American Geophysical Union) Fall Meeting 2013 in San Francisco that was intended as an introduction to the topic of data publication.
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...pchutichetpong
M Capital Group (“MCG”) expects to see demand and the changing evolution of supply, facilitated through institutional investment rotation out of offices and into work from home (“WFH”), while the ever-expanding need for data storage as global internet usage expands, with experts predicting 5.3 billion users by 2023. These market factors will be underpinned by technological changes, such as progressing cloud services and edge sites, allowing the industry to see strong expected annual growth of 13% over the next 4 years.
Whilst competitive headwinds remain, represented through the recent second bankruptcy filing of Sungard, which blames “COVID-19 and other macroeconomic trends including delayed customer spending decisions, insourcing and reductions in IT spending, energy inflation and reduction in demand for certain services”, the industry has seen key adjustments, where MCG believes that engineering cost management and technological innovation will be paramount to success.
MCG reports that the more favorable market conditions expected over the next few years, helped by the winding down of pandemic restrictions and a hybrid working environment will be driving market momentum forward. The continuous injection of capital by alternative investment firms, as well as the growing infrastructural investment from cloud service providers and social media companies, whose revenues are expected to grow over 3.6x larger by value in 2026, will likely help propel center provision and innovation. These factors paint a promising picture for the industry players that offset rising input costs and adapt to new technologies.
According to M Capital Group: “Specifically, the long-term cost-saving opportunities available from the rise of remote managing will likely aid value growth for the industry. Through margin optimization and further availability of capital for reinvestment, strong players will maintain their competitive foothold, while weaker players exit the market to balance supply and demand.”
1. Sample Registration
Made Easy
KERSTIN LEHNERT
System for Earth Sample Registration SESAR
http://www.geosamples.org
8.18/2019 GOLDSCHMIDT 2019 WORKSHOP "VIVE LES SAMPLES" 1
2. The Value of Samples
Specimens/samples are the source of observational
data and measurements across disciplines.
◦ Study the inaccessible in time and space.
◦ Study properties that cannot be measured in-situ.
Samples provide irreplaceable evidence of long-term
historical trends.
◦ Record the state of nature at a given place & time.
Samples record unique events in history.
Samples are essential to calibrate proxy data.
Samples serve as standards or references.
8.18/2019 GOLDSCHMIDT 2019 WORKSHOP "VIVE LES SAMPLES" 2
“Research projects
involve the study of
physical objects
collected from places
ranging from the
earth’s interior to the
depths of the ocean to
the reaches of outer
space.”
“Scientific Collections: Mission-Critical
Resources for Federal Science Agencies”
IWGSC, 2009
3. Sharing Samples
8.18/2019 GOLDSCHMIDT 2019 WORKSHOP "VIVE LES SAMPLES" 3
Providing access to actual physical samples is important. .85 (.16)
Providing access to actual physical samples is easy. .38 (.23)
iSamples RCN Survey
Joel Cutcher-Gershenfeld, 2015
4. Sharing Samples: Community Concerns
“Global Access to Global Collections: establish repositories for
all physical samples and the biological, geochemical and
physical measurements made from those samples.”
(Paleogeoscience)
“Poor and uneven access and management of sample
collections, incomplete sample tracking and linking of samples
to analyses in the literature and databases, discoverability of
existing samples” (Petrology & Geochem)
“Need central archive of experimental samples with integrated
workflows, database templates, and community-wide DOI
system for samples” (Mineral Physics & Rock Deformation)
4
From Executive Summaries of EarthCube Domain End-user Workshops 2013
8.18/2019 GOLDSCHMIDT 2019 WORKSHOP "VIVE LES SAMPLES"
5. GOLDSCHMIDT 2019 WORKSHOP "VIVE LES SAMPLES" 5
M. McNutt, K. Lehnert, B. Hanson, B. A. Nosek, A. M. Ellison, J. L.
King; SCIENCE Policy Forum, 04 MAR 2016
“Access to data, samples, methods, and reagents used to conduct
research and analysis, as well as to the code used to analyze and
process data and samples, is a fundamental requirement for
transparency and reproducibility.”
8.18/2019
6. AGU 2019 Union Session on Samples
(Inter)National Treasures: Advancing Earth, Environmental, & Planetary
Sciences Through Access, Accreditation, and Use of Natural History
Samples and Collections
Panelists:
Marcia McNutt, National Academies of Sciences, Engineering & Medicine
Carol Roetzel Butler, National Museum of Natural History
David E Schindel, Smithsonian Institution
Mark Wimer, USGS
Dimitri Koureas, DISSCo/Naturalis, Netherlands
Jennifer Mabuka-Maroa, African Academy of Sciences, Kenia
Lesley Wyborn, Australian National University
8.18/2019 GOLDSCHMIDT 2019 WORKSHOP "VIVE LES SAMPLES" 6
7. Tracking Samples & Sample Data
Have you ever been able to find all data for a specific sample in the
literature?
Have you been able to figure out if samples in different publications
that have the same name or number are actually from the same
specimen?
Are you able to identify every samples in your lab or desk or archive
and find out within seconds where, when, and how you collected
the sample?
8.18/2019 GOLDSCHMIDT 2019 WORKSHOP "VIVE LES SAMPLES" 7
No?
8. Example 1
Problems:
Ambiguous sample naming
Lack of relevant metadata
Data are not reproducible
Sample cannot be located
8.18/2019 GOLDSCHMIDT 2019 WORKSHOP "VIVE LES SAMPLES" 8
“The key measurement was the one backarc basalt
called "PPTUW”...
Subsequent efforts to confirm the observation ran
into problems. The apparently-same sample was
variously called PPTU, PPTUW/5, PPTUW-1, and
TVZ19 in four other papers. None of those papers
gave its latitude and longitude…!”
(J. Gill and E. Todd, personal communication 2013, related to
IEDA data rescue effort)
9. Example 2
Problem:
Dear Dr. Goldstein,
I was re-reading your wonderful paper “A Sm-Nd isotope
study of atmospheric dusts and particulates from major river
systems” that was published in EPSL in 1984, and had a quick
question about the Mississippi River sample included in Table
1. I have a student who is working on the REE geochemistry of
the Mississippi River and its associated estuary for his PhD
dissertation and we are trying to compile all of the Nd isotope
data from the literature. Anyway, the sample you list in Table
1 of your paper is identified as a “bulk sample”, which I
assume is a bulk river sediment sample. Is this correct? Also,
do you remember approximately where it was collected along
the river?
Best wishes,
Karen
Incomplete and ambiguous metadata
8.18/2019 GOLDSCHMIDT 2019 WORKSHOP "VIVE LES SAMPLES" 9
10. Example 2
… Anyway, to answer your question, that sample was
provided by Bob Meade of the USGS. … But I don’t know if it
was suspended material, bedload, or deposited on the banks.
Best I can do with the location at this point is to refer to the
figure in the paper, which shows it was collected close on the
delta.
I noticed that it says in the paper that info on the samples is
available from the authors. That was true at the time, and
even probably a decade or so later, but at this point I don’t
know where the notes are for those samples, once again
showing the importance of IGSNs. When I’m back at LDEO I’ll
check to see if I can find that old notebook.
Loss of metadata
Data cannot be re-used
8.18/2019 GOLDSCHMIDT 2019 WORKSHOP "VIVE LES SAMPLES" 10
11. ANDS Webinar IGSN | Linking Data and Samples
Why do we need a unique identifier for samples (Part 1) ?
In the EarthChem global geochemical database all
these samples are labeled ‘M1’
11
12. What Are the Problems?
Lack of central or federated catalogs of sample metadata to find samples,
preserve, and provide persistent access to sample metadata
Lack of common Best Practices for sample identification, documentation, and
registration that are essential to build such catalogs.
Software tools that support personal or institutional sample management &
curation.
Facilities for sample curation and archiving.
8.18/2019 GOLDSCHMIDT 2019 WORKSHOP "VIVE LES SAMPLES" 12
13. Addressing the Problems
The International Geo Sample Number IGSN
The System for Earth Sample Registration SESAR
8.18/2019 GOLDSCHMIDT 2019 WORKSHOP "VIVE LES SAMPLES" 13
14. IGSN International GeoSample Number
A globally unique and persistent identifier for physical objects in the Earth
Sciences
◦ guaranteed to be unique via a centralized control mechanism (unique name spaces)
◦ resolves to virtual sample representations (sample metadata profiles) managed at federated
IGSN Allocating Agents.
8.18/2019 14GOLDSCHMIDT 2019 WORKSHOP "VIVE LES SAMPLES"
• Name: Kerstin Lehnert
• SSN: 768-90-6482
• Name: HLY0102 D3-1
• IGSN: KAL7J8F55
People Samples
15. Persistent Identifiers (PID)
Build a FAIR Data Ecosystem
Locate (Find)
Access
Link (Interoperate)
Cite
ESIP SUMMER MEETING 2019 15
Programs
Cruise DOI
Dataset publication
Dataset DOI
Funding
FundRef#
Article publication
Publication DOIORCID
Researchers
Samples
IGSN
16. IGSN Overview: what does it do?
Provides identifiers that are guaranteed to be unique via an international
governance system (like assigning IP addresses)
Allows discovery and access to physical samples online:
◦ Web applications and programmatic access to sample metadata catalogues
◦ Networks with sample repositories and data centres
Ensures preservation of, and access to sample data
Aids in the unambiguous identification of samples in the literature and of data
derived from them
Try it out: http://igsn.org/ICDP5054ESYI201 or http://igsn.org/AU1101
168.18/2019 GOLDSCHMIDT 2019 WORKSHOP "VIVE LES SAMPLES"
17. What IGSN can be used for
Geological samples and other materials
(rocks, water, biological materials, …)
Collections (groupings of samples)
Sampling features (boreholes, outcrops, …)
Samples can be linked to each other through
the “related identifier” metadata element
(e.g., minerals separated from a parent rock,
legs from a fossil beetle
178.18/2019 GOLDSCHMIDT 2019 WORKSHOP "VIVE LES SAMPLES"
18. Tracking the sample life cycle
IGSN supports tracking of samples and
sample logistics.
◦ In the field: unambiguous identification,
metadata capture with mobile app.
◦ In the lab: identification and tying data to
samples.
◦ In the sample repository: identify collections
and samples in storage, catalogue, manage
sample logistics.
◦ In the data repository: link samples to data and
publications; link data to a for a given sample in
different publications and databases.
188.18/2019 GOLDSCHMIDT 2019 WORKSHOP "VIVE LES SAMPLES"
19. IGSN: Supports Shared Collections
8.18/2019 GOLDSCHMIDT 2019 WORKSHOP "VIVE LES SAMPLES" 19
“Samples collected during collaborative Field
Institutes will be assigned International
GeoSample Numbers (IGSNs) and registered
with the System for Earth Sample
Registration (SESAR).
In contrast to a traditional “field trip”, wherein
an expert leads a group of participants
through the field area pointing out features of
interest along the way the quickly moving on
to the next stop, the mission of ExTerra Field
Institutes is to spend a longer amount of time
at a smaller number of stops, making field
observations and collecting samples for group
research.”
http://geoprisms.org/exterra/sample-data-management/
20. IGSN: Enables Linking of Samples with
Data and Publications
Specimen (IGSN) Spectral Results (DOI) Publication (DOI)
208.18/2019 GOLDSCHMIDT 2019 WORKSHOP "VIVE LES SAMPLES"
21. IGSN in the Literature
Earth science publishers recommend the use of
IGSN to reference samples in community
commitment statements*
◦ Example: Dere, A. L., T. S. White, R. H. April, B.
Reynolds, T. E. Miller, E. P. Knapp, L. D. McKay, and S. L.
Brantley (2013), Climate dependence of feldspar
weathering in shale soils along a latitudinal gradient,
Geochimica et Cosmochimica Acta, 122, 101–126,
http://dx.doi.org/10.1016/j.gca.2013.08.001.
218.18/2019 GOLDSCHMIDT 2019 WORKSHOP "VIVE LES SAMPLES"
*see: https://copdess.org/community-commitment-statements/
22. Adoption
ESIP SUMMER MEETING 2019 22
Repositories will strive to: “... Ensure that unique, persistent identifiers are used for
authors (e.g., ORCID), research objects (e.g., Digital Object Identifier), and physical
samples (e.g., IGSN).”
Publishers will strive to: “... Implement standard identifiers for all authors (e.g., ORCID),
author contributions (e.g., CRediT), samples (e.g., IGSN), institutions, funders and grants,
and other identifiers as they are developed and adopted.
24. IGSN Adoption: Publishers
24
“… AGU Publications also strongly encourages use of
other identifiers in our journal papers. International Geo
Sample Numbers (IGSNs) uniquely identify items, such
as a rock sample, a piece of coral, or a vial of water
taken from the natural environment, and provide
important, consistent information about these samples.
Registering samples and including the IGSN in papers
helps secure provenance information but most
importantly connects common samples across multiple
studies in the literature. IGSNs also will help you keep
track of your samples. These identifiers can be reserved
before a field season or assigned afterward.”
Hanson, B. (2016), AGU opens its journals to author identifiers,
Eos, 97, doi:10.1029/2016EO043183.
Published on 7 January 2016.
GOLDSCHMIDT 2019 WORKSHOP "VIVE LES SAMPLES"8.18/2019
26. IGSN in Data Systems: EarthChem Library
8.18/2019 GOLDSCHMIDT 2019 WORKSHOP "VIVE LES SAMPLES" 26
27. The IGSN Organization
24 members in the IGSN e.V.
In 5 countries (4 continents)
8 functional Allocating Agents (AA)
Multiple AAs under development
◦ British Geological Survey
◦ USGS
◦ CNRS
◦ SAEON (South Africa)
27
4,344,036
2,364,916
136,476
32,633
7,948
4,258
246
2
SESAR GeoSciAus MARUM CSIRO GFZ IFREMER KIGAM ARDC
Number of Registered Samples by Allocating Agent
Logscale
28. ... And Growing
iSamples project (in development): adoption of IGSN in biology and archeology
DiSSCo (Distributed System of Scientific Collections in Europe): committed to
using IGSN, 2 billion specimens to be registered!
Ocean Discovery Program (ODP): Repository at MARUM already using IGSN,
JAMSTEC and TAMU are planning implementation
Smithsonian Institution (beyond National Mineral Collection)
NASA: Astromaterial collections registration in process
National Labs: LLNL, LBNL, BNL starting
28
29. Recent Developments
Organization has grown substantially over the last 2-3 years with major
organizations joining IGSN e.V.
Expansion beyond Earth sciences is happening.
IGSN2040 project funded by Sloan Foundation in 2018.
ESIP SUMMER MEETING 2019 29
“develop a strategic plan and roadmap that will guide the IGSN system in its next
chapter so it will be able to fulfill its mission of providing persistent, sustainable, and
reliable PID services to the international science community.”
30. SESAR System for Earth Sample Registration
Web-based database that catalogs and preserves metadata of samples
submitted by users (incl. researchers, repositories, labs)
Allocating Agent in the IGSN e.V. (International Geo Sample Number)
Authenticated workspace for users to submit and manage sample metadata
Online search of the metadata catalog
8.18/2019 GOLDSCHMIDT 2019 WORKSHOP "VIVE LES SAMPLES" 30
www.iedadata.org
www.geosamples.org
31. How to Register Your Samples
8.18/2019 GOLDSCHMIDT 2019 WORKSHOP "VIVE LES SAMPLES" 31
37. 8.18/2019 GOLDSCHMIDT 2019 WORKSHOP "VIVE LES SAMPLES" 37
To see a list of SESAR controlled and
suggested vocabularies, including Object
Type, see
www.geosamples.org/help/vocabularies.
38. • Check off metadata fields you
wish to complete
• Click “Submit to create template”
• Open zip file with the batch
template and the SESAR Quick
Guide
• the Guide provides examples,
definitions and additional
instructions for entering metadata
for each field in the template.
388.18/2019 GOLDSCHMIDT 2019 WORKSHOP "VIVE LES SAMPLES"
40. Complete the Template!
40
NOTE:
• Currently a template is for a single sample type only
• Private/public setting applies to all samples in a single
template
• Check the instructions for date format.
Improvements coming soon!
8.18/2019 GOLDSCHMIDT 2019 WORKSHOP "VIVE LES SAMPLES"
44. 8.18/2019 GOLDSCHMIDT 2019 WORKSHOP "VIVE LES SAMPLES" 44
You will receive an email usually within a
day confirming the samples have been
registered and providing the assigned IGSNs.
48. Pre-registering Samples
Before Fieldwork or
Subsampling
GOLDSCHMIDT 2019 WORKSHOP "VIVE LES SAMPLES"
48
Example:
Upload metadata for pre-registered
samples after field work is completed.
Other use cases:
- change release data for private
samples
- add more specific metadata after
samples have been studied in the lab
- add parent IGSNs if they were
unknown at time of registration8.18/2019
49. Batch Update: Before and After
GOLDSCHMIDT 2019 WORKSHOP "VIVE LES SAMPLES"
498.18/2019
50. Sample Registration: Important Advice
Samples should be registered by the sample owner (who has the physical object)
◦ Metadata management can be collaborative (sample owner can share SESAR account privileges)
◦ Sample metadata can be transferred if the sample ownership changes
Register samples as soon as possible after collection (in the field, in the repository)
◦ Possibility to ‘pre-register’ IGSNs so you can label samples with IGSNs in the field
Register any subsamples and splits and link to the ‘parent sample’
Ensure that your sample metadata are as comprehensive as possible from the start
◦ You can add metadata later, but will you?
◦ How discoverable and re-usable are your samples without critical metadata?
8.18/2019 GOLDSCHMIDT 2019 WORKSHOP "VIVE LES SAMPLES" 50
51. SESAR Help Resources
8.18/2019 GOLDSCHMIDT 2019 WORKSHOP "VIVE LES SAMPLES" 51
https://www.youtube.com/user/iedadata
http://www.geosamples.org/help
52. iSamples RCN Resources
Training modules for Sample Management
◦ Created by Early Career Scientists (A.Dere, B. Hallett)
◦ Sample type specific (soil cores, rock outcrop samples)
◦ Published in EarthChem Library
MARS (Middleware for Assisting with the Registration of Samples, J. Bowring)
◦ software prototype that allows users to seamlessly push metadata from a preferred sample
metadata format to SESAR
8.18/2019 GOLDSCHMIDT 2019 WORKSHOP "VIVE LES SAMPLES" 52
https://www.earthcube.org/group/isamples
53. Thanks! Questions?
8.18/2019 GOLDSCHMIDT 2019 WORKSHOP "VIVE LES SAMPLES" 53
Contact us: info@geosamples.org
Join us at AGU Fall Meeting 2017
- IEDA booth in exhibit hall (#1519)
- IGSN Information Session (for date
and location, check our web site)
Spread the word!
Editor's Notes
The following simple statements are fundamental and establish a universal guideline for an implementation of the data infrastructure:
a Digital Object has a structured bit sequence that is stored in trustworthy repositories
a Digital Object has assigned a PID and metadata
the PID of a Digital Object is associated with all relevant kernel information that allows humans and machines to enable findability, accessibility, interoperability and re-usability38
kernel information and digital objects have types allowing humans and machines to associate operations with them