Rutherford Appleton Laboratory uses Panasas ActiveStor to accelerate global c...Panasas
With nearly 8.5 petabytes of ActiveStor storage, the Panasas installation at Rutherford Appleton Laboratory (RAL) represents one of the largest multi-location, high-performance computing (HPC) storage deployments in Great Britain. Panasas ActiveStor gives RAL a solution that offers extreme scalability and simple storage management capabilities so that scientists can focus on important research, not on cumbersome system administration.
This slide deck provides an update on the development of the Astromaterials Data System, a project funded by NASA to ensure the long-term accessibility and utility of lab analytical data acquired on astromaterials samples curated at the Johnson Space Center, including samples collected on the moon during the Apollo missions and meteorites collected in Antarctica.
Data systems in NASA?s Earth Science Division are primarily focused on providing stewardship of the products of remote sensing and are manifested as Digital Active Archive Systems. Each Instrument Team has a related Science Team which defines the algorithms and monitors the processing of the output of the instruments to produce the related data products and in an format and standards compliance of them. These teams are influenced also by the research and applied sciences components of the programs, but the primary focus is on proving the ongoing validity of the products. Across the distributed system, every product is different. However, this is not conducive to analytics. NASA?s Advanced Information Systems Technology (AIST) program is developing an entirely new approach to creating Analytic Centers which focus on the scientific investigation and harmonize the data, computing resources and tools to enable and to accelerate scientific discovery. Stay tuned to _nd out how. A major element, in today’s science interests, is the comparison of multi-dimensional datasets; this warrants considerable experimentation in trying to understand how to do so meaningfully and quantitatively; asked another way, \What do you mean by similar?" Uncertainty quantification has evolved considerably in the arenas of data reduction and full physics models; however, the emerging demand for machine learning and other artificial intelligence techniques has failed to keep uncertainty quantification and error propagation in mind and there is considerable work to be done.
Eli Whitney invented the cotton gin while staying with Catherine Greene. He experimented over the winter and spring, arriving at a design with a spiked cylinder that pulled cotton fibers through slits while leaving seeds behind. This allowed one man and horse to do the work of 50 men with old machines. However, Whitney made little money as others copied his invention without paying. He was left virtually penniless. Whitney later moved to New Haven and developed the idea of interchangeable parts for manufacturing firearms, allowing unskilled workers to produce consistent products using machines. This "American System of Manufacture" spread across the country.
2013 OHSUG - Clinical Data Warehouse ImplementationPerficient
The document discusses implementing a data warehouse using Oracle Life Sciences Hub (LSH). It covers example types of data warehouses including operational, exploratory analysis, medical review, and safety mining. Techniques for creating data warehouses within and external to LSH are presented, along with common challenges such as auditing, expertise, and standards changes. The presentation provides an overview of data warehouse implementation using LSH.
STORAGE GROWING FORECAST WITH BACULA BACKUP SOFTWARE CATALOG DATA MININGcsandit
Backup software information is a potential source for data mining: not only the unstructured
stored data from all other backed-up servers, but also backup jobs metadata, which is stored in
a formerly known catalog database. Data mining this database, in special, could be used in
order to improve backup quality, automation, reliability, predict bottlenecks, identify risks,
failure trends, and provide specific needed report information that could not be fetched from
closed format property stock property backup software database. Ignoring this data mining
project might be costly, with lots of unnecessary human intervention, uncoordinated work and
pitfalls, such as having backup service disruption, because of insufficient planning. The specific
goal of this practical paper is using Knowledge Discovery in Database Time Series, Stochastic
Models and R scripts in order to predict backup storage data growth. This project could not be
done with traditional closed format proprietary solutions, since it is generally impossible to
read their database data from third party software because of vendor lock-in deliberate
overshadow. Nevertheless, it is very feasible with Bacula: the current third most popular backup
software worldwide, and open source. This paper is focused on the backup storage demand
prediction problem, using the most popular prediction algorithms. Among them, Holt-Winters
Model had the highest success rate for the tested data sets.
Kowal RDAP11 Data Archives in Federal AgenciesASIS&T
Dan Kowal, NOAA/NGDC; Data Archives in Federal Agencies; RDAP11 Summit
The 2nd Research Data Access and Preservation (RDAP) Summit
An ASIS&T Summit
March 31-April 1, 2011 Denver, CO
In cooperation with the Coalition for Networked Information
http://asist.org/Conferences/RDAP11/index.html
Rutherford Appleton Laboratory uses Panasas ActiveStor to accelerate global c...Panasas
With nearly 8.5 petabytes of ActiveStor storage, the Panasas installation at Rutherford Appleton Laboratory (RAL) represents one of the largest multi-location, high-performance computing (HPC) storage deployments in Great Britain. Panasas ActiveStor gives RAL a solution that offers extreme scalability and simple storage management capabilities so that scientists can focus on important research, not on cumbersome system administration.
This slide deck provides an update on the development of the Astromaterials Data System, a project funded by NASA to ensure the long-term accessibility and utility of lab analytical data acquired on astromaterials samples curated at the Johnson Space Center, including samples collected on the moon during the Apollo missions and meteorites collected in Antarctica.
Data systems in NASA?s Earth Science Division are primarily focused on providing stewardship of the products of remote sensing and are manifested as Digital Active Archive Systems. Each Instrument Team has a related Science Team which defines the algorithms and monitors the processing of the output of the instruments to produce the related data products and in an format and standards compliance of them. These teams are influenced also by the research and applied sciences components of the programs, but the primary focus is on proving the ongoing validity of the products. Across the distributed system, every product is different. However, this is not conducive to analytics. NASA?s Advanced Information Systems Technology (AIST) program is developing an entirely new approach to creating Analytic Centers which focus on the scientific investigation and harmonize the data, computing resources and tools to enable and to accelerate scientific discovery. Stay tuned to _nd out how. A major element, in today’s science interests, is the comparison of multi-dimensional datasets; this warrants considerable experimentation in trying to understand how to do so meaningfully and quantitatively; asked another way, \What do you mean by similar?" Uncertainty quantification has evolved considerably in the arenas of data reduction and full physics models; however, the emerging demand for machine learning and other artificial intelligence techniques has failed to keep uncertainty quantification and error propagation in mind and there is considerable work to be done.
Eli Whitney invented the cotton gin while staying with Catherine Greene. He experimented over the winter and spring, arriving at a design with a spiked cylinder that pulled cotton fibers through slits while leaving seeds behind. This allowed one man and horse to do the work of 50 men with old machines. However, Whitney made little money as others copied his invention without paying. He was left virtually penniless. Whitney later moved to New Haven and developed the idea of interchangeable parts for manufacturing firearms, allowing unskilled workers to produce consistent products using machines. This "American System of Manufacture" spread across the country.
2013 OHSUG - Clinical Data Warehouse ImplementationPerficient
The document discusses implementing a data warehouse using Oracle Life Sciences Hub (LSH). It covers example types of data warehouses including operational, exploratory analysis, medical review, and safety mining. Techniques for creating data warehouses within and external to LSH are presented, along with common challenges such as auditing, expertise, and standards changes. The presentation provides an overview of data warehouse implementation using LSH.
STORAGE GROWING FORECAST WITH BACULA BACKUP SOFTWARE CATALOG DATA MININGcsandit
Backup software information is a potential source for data mining: not only the unstructured
stored data from all other backed-up servers, but also backup jobs metadata, which is stored in
a formerly known catalog database. Data mining this database, in special, could be used in
order to improve backup quality, automation, reliability, predict bottlenecks, identify risks,
failure trends, and provide specific needed report information that could not be fetched from
closed format property stock property backup software database. Ignoring this data mining
project might be costly, with lots of unnecessary human intervention, uncoordinated work and
pitfalls, such as having backup service disruption, because of insufficient planning. The specific
goal of this practical paper is using Knowledge Discovery in Database Time Series, Stochastic
Models and R scripts in order to predict backup storage data growth. This project could not be
done with traditional closed format proprietary solutions, since it is generally impossible to
read their database data from third party software because of vendor lock-in deliberate
overshadow. Nevertheless, it is very feasible with Bacula: the current third most popular backup
software worldwide, and open source. This paper is focused on the backup storage demand
prediction problem, using the most popular prediction algorithms. Among them, Holt-Winters
Model had the highest success rate for the tested data sets.
Kowal RDAP11 Data Archives in Federal AgenciesASIS&T
Dan Kowal, NOAA/NGDC; Data Archives in Federal Agencies; RDAP11 Summit
The 2nd Research Data Access and Preservation (RDAP) Summit
An ASIS&T Summit
March 31-April 1, 2011 Denver, CO
In cooperation with the Coalition for Networked Information
http://asist.org/Conferences/RDAP11/index.html
sers, Applications and the Community of Practice for the Air Quality ScenarioRudolf Husar
The document discusses the GEOSS (Global Earth Observation System of Systems) architecture for the air quality community. It proposes an architecture where air quality services could register with the GEOSS registry and be discovered and invoked by users. This would allow data analysts to compose and visualize air quality data workflows to inform decision makers. It also discusses establishing an air quality community of practice to facilitate collaboration.
2008-05-05 GEOSS UIC-ADC AQ Scen W shop TorontoRudolf Husar
The document discusses the GEOSS (Global Earth Observation System of Systems) architecture for the air quality community. It proposes an architecture where air quality services register with the GEOSS registry and are discoverable through the GEOSS clearinghouse. This would allow users to find, select, and link to relevant air quality services. The architecture envisions community air quality catalogs that aggregate catalog listings and allow users to access data and models through composed workflows.
This document summarizes the fifth annual HDF workshop sponsored by ESDIS and NCSA. It provides an overview of the status of ESDIS, HDF/HDF-EOS, and plans for the future. Over 750 terabytes of Terra and Landsat 7 data have been processed and made available. Some instruments like ASTER and CERES now have validated data while others like MODIS are still being reprocessed. Future plans include installing data pools at DAACs and procuring an EMD contract to support ongoing EOS operations. The community advisory process involves groups like UWGs, DAWG, and SWGD to provide feedback. HDF is a file format for scientific data while HDF-EOS is the
How to Rapidly Configure Oracle Life Sciences Data Hub (LSH) to Support the M...Perficient
This document outlines best practices for rapidly configuring Oracle Life Sciences Data Hub (LSH) to support patient data management. It discusses data flows, conforming data to standards, necessary utilities and tools, infrastructure requirements, and implementation process. The presentation recommends hosting the LSH environment with BioPharm for a turnkey solution, reducing time and risks compared to a custom implementation.
2013 OHSUG - Use Cases for Using the Program Type View in Oracle Life Science...Perficient
This document outlines best practices for rapidly configuring Oracle Life Sciences Data Hub (LSH) to support patient data management. It discusses data flows, conforming data to standards, using utilities and tools, infrastructure requirements, and implementation process. The presentation provides examples of how to set up utilities, production environments, and recommends using a pre-configured solution from BioPharm Systems to quickly get a best practice LSH environment operational.
Curation and Preservation of Crystallography DataManjulaPatel
A presentation given by Manjula Patel (UKOLN) at "Chemistry in the Digital Age: A Workshop connecting research and education", June 11-12th 2009, Penn State University,
http://www.chem.psu.edu/cyberworkshop09
A HYBRID LEARNING ALGORITHM IN AUTOMATED TEXT CATEGORIZATION OF LEGACY DATAijaia
The goal of this research is to develop an algorithm to automatically classify measurement types from NASA’s airborne measurement data archive. The product has to meet specific metrics in term of accuracy, robustness and usability, as the initial decision-tree based development has shown limited applicability due to its resource intensive characteristics. We have developed an innovative solution that is much more efficient while offering comparable performance. Similar to many industrial applications, the data available are noisy and correlated; and there is a wide range of features that are associated with the type of measurement to be identified. The proposed algorithm uses a decision tree to select features and determine their weights.A weighted Naive Bayes is used due to the presence of highly correlated inputs. The development has been successfully deployed in an industrial scale, and the results show that the development is well-balanced in term of performance and resource requirements.
A HYBRID LEARNING ALGORITHM IN AUTOMATED TEXT CATEGORIZATION OF LEGACY DATAgerogepatton
The goal of this research is to develop an algorithm to automatically classify measurement types from NASA’s airborne measurement data archive. The product has to meet specific metrics in term of accuracy, robustness and usability, as the initial decision-tree based development has shown limited applicability due to its resource intensive characteristics. We have developed an innovative solution that is much more efficient while offering comparable performance. Similar to many industrial applications, the data available are noisy and correlated; and there is a wide range of features that are associated with the type of measurement to be identified. The proposed algorithm uses a decision tree to select features and determine their weights.A weighted Naive Bayes is used due to the presence of highly correlated inputs. The development has been successfully deployed in an industrial scale, and the results show that the development is well-balanced in term of performance and resource requirements.
A presentation given as part of the DC101 training course run by the DCC at Oxford University in June 2010. The course provided data management guidance for researchers.
OAIS and It's Applicability for Libraries, Archives, and Digital Repositories...faflrt
ALA/FAFLRT Workshop on Open Archival Information Service (OAIS). Presented by Robin Dale, RLG. Sponsored by ALA Federal and Armed Forces Libraries Roundtable (FAFLRT). Presented on June 16, 2001 at the ALA Annual Conference.
The Research Data Alliance (RDA) has developed a Catalogue of Metadata standards and tools aimed at researchers and those who support them. In its new version, the Metadata Standards Catalog will provide much greater detail about metadata standards and tools, and through its new API - it will be usable within other applications. It will also provide a platform for furthering the work of the RDA Metadata Interest Group, which is seeking to improve the interoperability of metadata in different standards by working towards semi-automatically generated converters.
AGS Members' Day 2015 - Data Transfer Format and BIM PresentationForumCourt
This document discusses systems theory and the classification of data, information, knowledge, understanding, and wisdom according to Russell Ackoff. It then applies this model to geotechnical engineering, providing examples of how data, information, reports, and designs fit within the model. It also summarizes the history and purpose of the AGS Data Transfer Format and the new BS8574 standard for management of geotechnical data. Finally, it outlines future developments for AGS4.1 Beta including additional groups for transferring geotechnical investigation report data and using cross sections.
The document discusses plans to analyze technical requirements for integrating repositories into the Virtual Open Access Agriculture & Aquaculture Repository (VOA3R) federation. It outlines general procedures for analyzing repositories that support the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) and those that do not. It also describes plans to analyze metadata elements, formats, languages, and other attributes across repositories to homogenize metadata without losing information when integrating repositories into the VOA3R federation.
Researchers require infrastructures that ensure a maximum of accessibility, stability and reliability to facilitate working with and sharing of research data. Such infrastructures are being increasingly summarised under the term Research Data Repositories (RDR). The project re3data.org – Registry of Research Data Repositories – began to index research data repositories in 2012 and offers researchers, funding organisations, libraries and publishers an overview of the heterogeneous research data repository landscape. In December 2014 re3data.org listed more than 1,030 research data repositories, which are described in detail using the re3data.org schema (http://dx.doi.org/10.2312/re3.003). Information icons help researchers to identify easily an adequate repository for the storage and reuse of their data. This talk describes the heterogeneous RDR landscape and presents a typology of institutional, disciplinary, multidisciplinary and project-specific RDR. Further, it outlines the features of re3data. org and it shows current developments for integration into data management planning tools and other services.
By the end of 2015 re3data.org and Databib (Purdue University, USA) will merge their services, which will then be managed under the auspices of DataCite. The aim of this merger is to reduce duplication of effort and to serve the research community better with a single, sustainable registry of research data repositories. The talk will present this organisational development as a best practice example for the development of international research information services.
This slideshow was used in an Introduction to Research Data Management course taught for the Mathematical, Physical and Life Sciences Division, University of Oxford, on 2015-02-09. It provides an overview of some key issues, looking at both day-to-day data management, and longer term issues, including sharing, and curation.
State Survey Experience with the National Geothermal Database systemDenise Hills
Presentation made at 2013 Annual Geological Society of America meeting held in Denver, CO, in October, about the process of developing the National Geothermal Database System as an iterative process between the system developers and the content creators. Also highlights some of the data preservation issues that plague geological sample archives, particularly at state surveys.
FAO's work focuses on reducing hunger and improving living conditions by collecting, analyzing, interpreting and disseminating agricultural information. The organization developed metadata standards and application profiles to facilitate information sharing, including the AGRIS Application Profile for bibliographic records and a Learning Resources Application Profile. FAO also maintains AGROVOC, an agricultural ontology to enhance subject indexing and retrieval across languages.
R2R and BCO-DMO are linked oceanographic data repositories that provide metadata for datasets collected from ocean research vessels and expeditions. They utilize linked data to improve discovery of datasets across repositories and attribute datasets to contributors. R2R catalogs vessel instrumentation and contains over 500k triples, while BCO-DMO catalogs PI-submitted datasets including over land deployments and contains over 2 million triples. The repositories overlap in contributors and some cruises, and link metadata to external sources like DBPedia.
#4 FAIR - Provenance as an element of FAIR data principles - 20-09-17ARDC
Margie Smith
Full Webinar: https://youtu.be/EDhJTCm9RN8
Transcript: https://www.slideshare.net/AustralianNationalDataService/transcript-4-fair-r-for-reusable
Other webinars in the series: http://www.ands.org.au/news-and-events/events/fair-webinar-series
Building Production Ready Search Pipelines with Spark and MilvusZilliz
Spark is the widely used ETL tool for processing, indexing and ingesting data to serving stack for search. Milvus is the production-ready open-source vector database. In this talk we will show how to use Spark to process unstructured data to extract vector representations, and push the vectors to Milvus vector database for search serving.
More Related Content
Similar to Provenance Context Content Standard Use Case with Physical Objects
sers, Applications and the Community of Practice for the Air Quality ScenarioRudolf Husar
The document discusses the GEOSS (Global Earth Observation System of Systems) architecture for the air quality community. It proposes an architecture where air quality services could register with the GEOSS registry and be discovered and invoked by users. This would allow data analysts to compose and visualize air quality data workflows to inform decision makers. It also discusses establishing an air quality community of practice to facilitate collaboration.
2008-05-05 GEOSS UIC-ADC AQ Scen W shop TorontoRudolf Husar
The document discusses the GEOSS (Global Earth Observation System of Systems) architecture for the air quality community. It proposes an architecture where air quality services register with the GEOSS registry and are discoverable through the GEOSS clearinghouse. This would allow users to find, select, and link to relevant air quality services. The architecture envisions community air quality catalogs that aggregate catalog listings and allow users to access data and models through composed workflows.
This document summarizes the fifth annual HDF workshop sponsored by ESDIS and NCSA. It provides an overview of the status of ESDIS, HDF/HDF-EOS, and plans for the future. Over 750 terabytes of Terra and Landsat 7 data have been processed and made available. Some instruments like ASTER and CERES now have validated data while others like MODIS are still being reprocessed. Future plans include installing data pools at DAACs and procuring an EMD contract to support ongoing EOS operations. The community advisory process involves groups like UWGs, DAWG, and SWGD to provide feedback. HDF is a file format for scientific data while HDF-EOS is the
How to Rapidly Configure Oracle Life Sciences Data Hub (LSH) to Support the M...Perficient
This document outlines best practices for rapidly configuring Oracle Life Sciences Data Hub (LSH) to support patient data management. It discusses data flows, conforming data to standards, necessary utilities and tools, infrastructure requirements, and implementation process. The presentation recommends hosting the LSH environment with BioPharm for a turnkey solution, reducing time and risks compared to a custom implementation.
2013 OHSUG - Use Cases for Using the Program Type View in Oracle Life Science...Perficient
This document outlines best practices for rapidly configuring Oracle Life Sciences Data Hub (LSH) to support patient data management. It discusses data flows, conforming data to standards, using utilities and tools, infrastructure requirements, and implementation process. The presentation provides examples of how to set up utilities, production environments, and recommends using a pre-configured solution from BioPharm Systems to quickly get a best practice LSH environment operational.
Curation and Preservation of Crystallography DataManjulaPatel
A presentation given by Manjula Patel (UKOLN) at "Chemistry in the Digital Age: A Workshop connecting research and education", June 11-12th 2009, Penn State University,
http://www.chem.psu.edu/cyberworkshop09
A HYBRID LEARNING ALGORITHM IN AUTOMATED TEXT CATEGORIZATION OF LEGACY DATAijaia
The goal of this research is to develop an algorithm to automatically classify measurement types from NASA’s airborne measurement data archive. The product has to meet specific metrics in term of accuracy, robustness and usability, as the initial decision-tree based development has shown limited applicability due to its resource intensive characteristics. We have developed an innovative solution that is much more efficient while offering comparable performance. Similar to many industrial applications, the data available are noisy and correlated; and there is a wide range of features that are associated with the type of measurement to be identified. The proposed algorithm uses a decision tree to select features and determine their weights.A weighted Naive Bayes is used due to the presence of highly correlated inputs. The development has been successfully deployed in an industrial scale, and the results show that the development is well-balanced in term of performance and resource requirements.
A HYBRID LEARNING ALGORITHM IN AUTOMATED TEXT CATEGORIZATION OF LEGACY DATAgerogepatton
The goal of this research is to develop an algorithm to automatically classify measurement types from NASA’s airborne measurement data archive. The product has to meet specific metrics in term of accuracy, robustness and usability, as the initial decision-tree based development has shown limited applicability due to its resource intensive characteristics. We have developed an innovative solution that is much more efficient while offering comparable performance. Similar to many industrial applications, the data available are noisy and correlated; and there is a wide range of features that are associated with the type of measurement to be identified. The proposed algorithm uses a decision tree to select features and determine their weights.A weighted Naive Bayes is used due to the presence of highly correlated inputs. The development has been successfully deployed in an industrial scale, and the results show that the development is well-balanced in term of performance and resource requirements.
A presentation given as part of the DC101 training course run by the DCC at Oxford University in June 2010. The course provided data management guidance for researchers.
OAIS and It's Applicability for Libraries, Archives, and Digital Repositories...faflrt
ALA/FAFLRT Workshop on Open Archival Information Service (OAIS). Presented by Robin Dale, RLG. Sponsored by ALA Federal and Armed Forces Libraries Roundtable (FAFLRT). Presented on June 16, 2001 at the ALA Annual Conference.
The Research Data Alliance (RDA) has developed a Catalogue of Metadata standards and tools aimed at researchers and those who support them. In its new version, the Metadata Standards Catalog will provide much greater detail about metadata standards and tools, and through its new API - it will be usable within other applications. It will also provide a platform for furthering the work of the RDA Metadata Interest Group, which is seeking to improve the interoperability of metadata in different standards by working towards semi-automatically generated converters.
AGS Members' Day 2015 - Data Transfer Format and BIM PresentationForumCourt
This document discusses systems theory and the classification of data, information, knowledge, understanding, and wisdom according to Russell Ackoff. It then applies this model to geotechnical engineering, providing examples of how data, information, reports, and designs fit within the model. It also summarizes the history and purpose of the AGS Data Transfer Format and the new BS8574 standard for management of geotechnical data. Finally, it outlines future developments for AGS4.1 Beta including additional groups for transferring geotechnical investigation report data and using cross sections.
The document discusses plans to analyze technical requirements for integrating repositories into the Virtual Open Access Agriculture & Aquaculture Repository (VOA3R) federation. It outlines general procedures for analyzing repositories that support the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) and those that do not. It also describes plans to analyze metadata elements, formats, languages, and other attributes across repositories to homogenize metadata without losing information when integrating repositories into the VOA3R federation.
Researchers require infrastructures that ensure a maximum of accessibility, stability and reliability to facilitate working with and sharing of research data. Such infrastructures are being increasingly summarised under the term Research Data Repositories (RDR). The project re3data.org – Registry of Research Data Repositories – began to index research data repositories in 2012 and offers researchers, funding organisations, libraries and publishers an overview of the heterogeneous research data repository landscape. In December 2014 re3data.org listed more than 1,030 research data repositories, which are described in detail using the re3data.org schema (http://dx.doi.org/10.2312/re3.003). Information icons help researchers to identify easily an adequate repository for the storage and reuse of their data. This talk describes the heterogeneous RDR landscape and presents a typology of institutional, disciplinary, multidisciplinary and project-specific RDR. Further, it outlines the features of re3data. org and it shows current developments for integration into data management planning tools and other services.
By the end of 2015 re3data.org and Databib (Purdue University, USA) will merge their services, which will then be managed under the auspices of DataCite. The aim of this merger is to reduce duplication of effort and to serve the research community better with a single, sustainable registry of research data repositories. The talk will present this organisational development as a best practice example for the development of international research information services.
This slideshow was used in an Introduction to Research Data Management course taught for the Mathematical, Physical and Life Sciences Division, University of Oxford, on 2015-02-09. It provides an overview of some key issues, looking at both day-to-day data management, and longer term issues, including sharing, and curation.
State Survey Experience with the National Geothermal Database systemDenise Hills
Presentation made at 2013 Annual Geological Society of America meeting held in Denver, CO, in October, about the process of developing the National Geothermal Database System as an iterative process between the system developers and the content creators. Also highlights some of the data preservation issues that plague geological sample archives, particularly at state surveys.
FAO's work focuses on reducing hunger and improving living conditions by collecting, analyzing, interpreting and disseminating agricultural information. The organization developed metadata standards and application profiles to facilitate information sharing, including the AGRIS Application Profile for bibliographic records and a Learning Resources Application Profile. FAO also maintains AGROVOC, an agricultural ontology to enhance subject indexing and retrieval across languages.
R2R and BCO-DMO are linked oceanographic data repositories that provide metadata for datasets collected from ocean research vessels and expeditions. They utilize linked data to improve discovery of datasets across repositories and attribute datasets to contributors. R2R catalogs vessel instrumentation and contains over 500k triples, while BCO-DMO catalogs PI-submitted datasets including over land deployments and contains over 2 million triples. The repositories overlap in contributors and some cruises, and link metadata to external sources like DBPedia.
#4 FAIR - Provenance as an element of FAIR data principles - 20-09-17ARDC
Margie Smith
Full Webinar: https://youtu.be/EDhJTCm9RN8
Transcript: https://www.slideshare.net/AustralianNationalDataService/transcript-4-fair-r-for-reusable
Other webinars in the series: http://www.ands.org.au/news-and-events/events/fair-webinar-series
Similar to Provenance Context Content Standard Use Case with Physical Objects (20)
Building Production Ready Search Pipelines with Spark and MilvusZilliz
Spark is the widely used ETL tool for processing, indexing and ingesting data to serving stack for search. Milvus is the production-ready open-source vector database. In this talk we will show how to use Spark to process unstructured data to extract vector representations, and push the vectors to Milvus vector database for search serving.
Generating privacy-protected synthetic data using Secludy and MilvusZilliz
During this demo, the founders of Secludy will demonstrate how their system utilizes Milvus to store and manipulate embeddings for generating privacy-protected synthetic data. Their approach not only maintains the confidentiality of the original data but also enhances the utility and scalability of LLMs under privacy constraints. Attendees, including machine learning engineers, data scientists, and data managers, will witness first-hand how Secludy's integration with Milvus empowers organizations to harness the power of LLMs securely and efficiently.
Digital Marketing Trends in 2024 | Guide for Staying AheadWask
https://www.wask.co/ebooks/digital-marketing-trends-in-2024
Feeling lost in the digital marketing whirlwind of 2024? Technology is changing, consumer habits are evolving, and staying ahead of the curve feels like a never-ending pursuit. This e-book is your compass. Dive into actionable insights to handle the complexities of modern marketing. From hyper-personalization to the power of user-generated content, learn how to build long-term relationships with your audience and unlock the secrets to success in the ever-shifting digital landscape.
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Speck&Tech
ABSTRACT: A prima vista, un mattoncino Lego e la backdoor XZ potrebbero avere in comune il fatto di essere entrambi blocchi di costruzione, o dipendenze di progetti creativi e software. La realtà è che un mattoncino Lego e il caso della backdoor XZ hanno molto di più di tutto ciò in comune.
Partecipate alla presentazione per immergervi in una storia di interoperabilità, standard e formati aperti, per poi discutere del ruolo importante che i contributori hanno in una comunità open source sostenibile.
BIO: Sostenitrice del software libero e dei formati standard e aperti. È stata un membro attivo dei progetti Fedora e openSUSE e ha co-fondato l'Associazione LibreItalia dove è stata coinvolta in diversi eventi, migrazioni e formazione relativi a LibreOffice. In precedenza ha lavorato a migrazioni e corsi di formazione su LibreOffice per diverse amministrazioni pubbliche e privati. Da gennaio 2020 lavora in SUSE come Software Release Engineer per Uyuni e SUSE Manager e quando non segue la sua passione per i computer e per Geeko coltiva la sua curiosità per l'astronomia (da cui deriva il suo nickname deneb_alpha).
Project Management Semester Long Project - Acuityjpupo2018
Acuity is an innovative learning app designed to transform the way you engage with knowledge. Powered by AI technology, Acuity takes complex topics and distills them into concise, interactive summaries that are easy to read & understand. Whether you're exploring the depths of quantum mechanics or seeking insight into historical events, Acuity provides the key information you need without the burden of lengthy texts.
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slackshyamraj55
Discover the seamless integration of RPA (Robotic Process Automation), COMPOSER, and APM with AWS IDP enhanced with Slack notifications. Explore how these technologies converge to streamline workflows, optimize performance, and ensure secure access, all while leveraging the power of AWS IDP and real-time communication via Slack notifications.
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on integration of Salesforce with Bonterra Impact Management.
Interested in deploying an integration with Salesforce for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Programming Foundation Models with DSPy - Meetup SlidesZilliz
Prompting language models is hard, while programming language models is easy. In this talk, I will discuss the state-of-the-art framework DSPy for programming foundation models with its powerful optimizers and runtime constraint system.
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfChart Kalyan
A Mix Chart displays historical data of numbers in a graphical or tabular form. The Kalyan Rajdhani Mix Chart specifically shows the results of a sequence of numbers over different periods.
Your One-Stop Shop for Python Success: Top 10 US Python Development Providersakankshawande
Simplify your search for a reliable Python development partner! This list presents the top 10 trusted US providers offering comprehensive Python development services, ensuring your project's success from conception to completion.
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Provenance Context Content Standard Use Case with Physical Objects
1. Applying the Emerging
PCCS to Physical Objects in
a Core Repository
A Use Case to Demonstrate Validity of
Broader Community Adaptation
Denise J. Hills, Geological Survey of Alabama
Sarah Ramdeen, UNC-Chapel Hill SILS
H. K. Ramapriyan, NASA Goddard Space Flight Center
2. Why Community Standards?
Data sets prepared and/or preserved with
community-accepted data management
standards are more likely to be used, now and
in the future
Standards developed using suggestions and
assessments by a diverse community enable
wider adoption without necessarily needing
customization
AGU Annual
Meeting
9 December 2013
3. Provenance and Context
Content Standard (PCCS)
ESIP Federation’s Data Stewardship Committee
developed the PCCS matrix based on
community input
Focus is on “what” needs to be preserved,
rather than “how”
Developed primarily with NASA/NOAA remote-
sensing missions in mind, but meant to be
easily adapted to other Earth Science data sets
Current matrix has 8 high-level categories
AGU Annual
Meeting
9 December 2013
5. PCCS – Content Attributes
Content name
More detailed definition
and description
Indication of why the
item needs to be
preserved
Criteria for quality
Priority for preservation
of the item
Source of the content
item during the data life
cycle
Project phase for
capturing the item
assessment
AGU Annual
Meeting
9 December 2013
6. About Use Cases
An approach to develop or refine the functional
specifications of a system
Intended to be characteristic of classes of
scenarios, although specific real-world examples
may enable fuller understanding of strengths
and weaknesses of what is being tested
Should attempt to cover the full “data life cycle”
AGU Annual
Meeting
9 December 2013
8. Use Case: Applying PCCS to a
Core Repository
Geological Survey of Alabama (GSA) houses
cores, cuttings, and other physical samples
collected from oil and gas wells drilled in the
state
Repository also contains samples from other
states (e.g., when they de-ascension items),
and from non-energy wells (e.g., drilled solely
for research)
AGU Annual
Meeting
9 December 2013
10. Why is GSA interested?
As a state agency, part of our mission is to
make data available to the public
GSA has not yet standardized records relating to
physical samples, making data discovery
difficult
As with many other agencies, there is limited
funding for preservation efforts so GSA must be
strategic
AGU Annual
Meeting
9 December 2013
12. Motivation for GSA to utilize
PCCS
Better use of resources
Time
Money
Training
Interoperability (and therefore potential for data
use and reuse) increases
Discoverability increases with standardization
AGU Annual
Meeting
9 December 2013
13. Core Repository
Documentation Available
Spreadsheets containing basic information
Associated O&G well (always)
Location in TRS format (always)
Type of sample (usually)
Internal sample number (sometimes)
Footage and/or unit sampled (occasionally)
Date acquired (rarely)
Related resources (rarely)
AGU Annual
Meeting
9 December 2013
14. Core Repository
Documentation Available
From the associated O&G well:
Location in Lat/Long NAD1927 (almost always)
Operator information (always)
Permitting information, including drilling, logging,
and completion dates (almost always)
Well TD (almost always)
Drilling logs (sometimes)
Can often get sample depths from the drilling log
Related resources (sometimes)
Core analyses can give further information on units
AGU Annual
Meeting
9 December 2013
15. Mapping PCCS High Level
Categories to Physical Samples
Current Category
PhysObj Category
1) Preflight/Pre-
1) Site
2) Product Data and
2) Product Data
Operations
Metadata
3) Documentation
4) Calibration
selection/predrilling
3) Documentation and
Metadata
4) Recovery
information*
AGU Annual
Meeting
9 December 2013
16. Mapping PCCS High Level
Categories to Physical Samples
Current Category
PhysObj Category
5) Product Software
5) Not Applicable*
6) Algorithm Input
6) Conventions
7) Validation
7) Not Applicable*
8) Software Tools
8) Not Applicable*
AGU Annual
Meeting
9 December 2013
17. Example PhysObj Content
Attributes – Site Selection
Content name
Permitting
Definition and description
Permit application with
associated documentation
Why the item needs to be
preserved
Resource information
about area
Priority for preservation
High
Source of the content
item during the data life
cycle
Well owner/operator
Project phase for
capturing the item
Pre-operational
QA of content
Complete and accurate
form
AGU Annual
Meeting
9 December 2013
18. Example PhysObj Content
Attributes – Data and Metadata
Content name
Core Sample | Subsample
Definition and description
Physical object collected
Why the item needs to be
preserved
Without the object
analyses cannot be done
QA of content
Preservation standards
Priority for preservation
High
Source of the content
item during the data life
cycle
Well owner/operator
(initial) | Repository
(post-ascension)
Project phase for
capturing the item
Post-drilling
AGU Annual
Meeting
9 December 2013
19. Example PhysObj Content
Attributes – Documentation
Content name
Metadata
Definition and description
Includes location, depth
of measurement,
techniques
Why the item needs to be
preserved
Provenance critical
QA of content
Comparison to robust
metadata content model
standards
Priority for preservation
High
Source of the content
item during the data life
cycle
Well owner/operator
(initial) | Regulatory
agency (initial)
|Repository (postascension)
Project phase for
capturing the item
During drilling (collection)
AGU Annual
Meeting
9 December 2013
20. Future Work
Categories in the PCCS that do not currently
have a clearly identified physical object
counterpart (e.g., Calibration; Validation) need
further examination:
Has the item not been captured in the current
repository, but should be?
Has the item been captured, but not identified yet
within the information available?
Is there a more universal description of the
content category?
AGU Annual
Meeting
9 December 2013
21. Future Work
Additional examination of category mapping on
a more detailed level is needed to fully define
each content item
PCCS should be applied to additional physical
repositories (additional use cases)
Ask us how!
AGU Annual
Meeting
9 December 2013
22. Acknowledgements
The Data Preservation Committee of the ESIP
Federation was fundamental to the development
of the material presented.
AGU Annual
Meeting
9 December 2013
23. TOWN HALL
Monday, 6:15-7:15pm
Moscone South 306
Connecting Data Stakeholders
for a Long-term Vision of Data
Stewardship
AGU Annual
Meeting
9 December 2013