Workshop in computational methods for materials science, presented at Spring 2010 ACS conference. This workshop illustrates how high-throughput computation and automation can be used with quantum chemistry calculations to solve problems in materials discovery. Examples include catalysts, fuel cells, OLEDs.
Elementary Landscape Decomposition of the Quadratic Assignment Problemjfrchicanog
This document discusses the elementary landscape decomposition of the Quadratic Assignment Problem (QAP). It begins with background on landscape theory and definitions. It then shows that the QAP fitness function can be decomposed into three elementary components. It discusses how this decomposition allows estimating autocorrelation parameters to analyze problem structure. Finally, it notes the decomposition provides insights and can inform algorithm design, and discusses applications to related problems like the Traveling Salesman Problem and DNA fragment assembly.
Lec-16: Subspace/Transform Optimization
Address the non-linearity issues in appearance manifolds by having a piece-wise linear solution. Query driven local model learning, subspace indexing on Grassmann manifold, direct Newtonian method of subspace optimization on Grassmann manifold.
The document discusses subspace indexing on Grassmannian manifolds for large scale visual identification. It proposes using local subspace models built on neighborhoods defined by queries, but notes issues with computational complexity and lack of optimality. It then introduces Grassmannian and Stiefel manifolds to characterize subspace similarity and define distances. A model hierarchical tree is proposed to index subspaces through iterative merging based on distances on the Grassmannian manifold.
How to get the maximum performance from your AEP server. This will discuss ways to improve execution time of short running jobs and how to properly configure the server depending on the expected number of users as well as the average size and duration of individual jobs. Included will be examples of making use of job pooling, Database connection sharing, and parallel subprotocol tuning. Determining when to make use of cluster, grid, or load balanced configurations along with memory and CPU sizing guidelines will also be discussed.
SharePoint 2007 provides several key collaboration and document management features in 3 sentences or less:
Calendar, discussion boards, email notifications, libraries, and wikis allow for event scheduling, group communication, storing and sharing documents, and collaborative editing. Lists and imported spreadsheets enable tracking contacts and timelines. Sites and workspaces break a SharePoint site into sections while surveys, custom views, and web parts support additional functionality and customization.
Software can be used to speed up R&D into sustainable solutions such as alternative energy (batteries, fuel cells, biomass conversion), catalysts, and eljminiating environmental toxins. The presentation gives an overview of the various methods and illustrates their applicaiton with case studies.
Software can be used to speed up R&D into sustainable solutions such as alternative energy (batteries, fuel cells, biomass conversion), catalysts, and eljminiating environmental toxins. The presentation gives an overview of the various methods and illustrates their applicaiton with case studies.
High-throughput Quantum Chemistry and Virtual Screening for Lithium Ion Batte...BIOVIA
The use of virtual structure libraries for computational screening to identify lead systems for further investigation has become a standard approach in drug discovery. Transferring this paradigm to challenges in material science is a recent possibility due to advances in the speed of computational resources and the efficiency and stability of materials modeling packages. This makes it possible for individual calculation steps to be executed in sequence comprising a high-throughput quantum chemistry workflow, in which material systems of varying structure and composition are analyzed in an automated fashion with the results collected in a growing data record. This record can then be sorted and mined to identify lead candidates and establish critical structure-property limits within a given chemical design space. To-date, only a small number of studies have been reported in which quantum chemical calculations are used in a high-throughput fashion to compute properties and screen for optimal materials solutions. However, with time, high-throughput computational screening will become central to advanced materials research.
In this presentation, the use of high-throughput quantum chemistry to analyze and screen a materials structure library is demonstrated for Li-Ion battery additives based on ethylene carbonate (EC).
Elementary Landscape Decomposition of the Quadratic Assignment Problemjfrchicanog
This document discusses the elementary landscape decomposition of the Quadratic Assignment Problem (QAP). It begins with background on landscape theory and definitions. It then shows that the QAP fitness function can be decomposed into three elementary components. It discusses how this decomposition allows estimating autocorrelation parameters to analyze problem structure. Finally, it notes the decomposition provides insights and can inform algorithm design, and discusses applications to related problems like the Traveling Salesman Problem and DNA fragment assembly.
Lec-16: Subspace/Transform Optimization
Address the non-linearity issues in appearance manifolds by having a piece-wise linear solution. Query driven local model learning, subspace indexing on Grassmann manifold, direct Newtonian method of subspace optimization on Grassmann manifold.
The document discusses subspace indexing on Grassmannian manifolds for large scale visual identification. It proposes using local subspace models built on neighborhoods defined by queries, but notes issues with computational complexity and lack of optimality. It then introduces Grassmannian and Stiefel manifolds to characterize subspace similarity and define distances. A model hierarchical tree is proposed to index subspaces through iterative merging based on distances on the Grassmannian manifold.
How to get the maximum performance from your AEP server. This will discuss ways to improve execution time of short running jobs and how to properly configure the server depending on the expected number of users as well as the average size and duration of individual jobs. Included will be examples of making use of job pooling, Database connection sharing, and parallel subprotocol tuning. Determining when to make use of cluster, grid, or load balanced configurations along with memory and CPU sizing guidelines will also be discussed.
SharePoint 2007 provides several key collaboration and document management features in 3 sentences or less:
Calendar, discussion boards, email notifications, libraries, and wikis allow for event scheduling, group communication, storing and sharing documents, and collaborative editing. Lists and imported spreadsheets enable tracking contacts and timelines. Sites and workspaces break a SharePoint site into sections while surveys, custom views, and web parts support additional functionality and customization.
Software can be used to speed up R&D into sustainable solutions such as alternative energy (batteries, fuel cells, biomass conversion), catalysts, and eljminiating environmental toxins. The presentation gives an overview of the various methods and illustrates their applicaiton with case studies.
Software can be used to speed up R&D into sustainable solutions such as alternative energy (batteries, fuel cells, biomass conversion), catalysts, and eljminiating environmental toxins. The presentation gives an overview of the various methods and illustrates their applicaiton with case studies.
High-throughput Quantum Chemistry and Virtual Screening for Lithium Ion Batte...BIOVIA
The use of virtual structure libraries for computational screening to identify lead systems for further investigation has become a standard approach in drug discovery. Transferring this paradigm to challenges in material science is a recent possibility due to advances in the speed of computational resources and the efficiency and stability of materials modeling packages. This makes it possible for individual calculation steps to be executed in sequence comprising a high-throughput quantum chemistry workflow, in which material systems of varying structure and composition are analyzed in an automated fashion with the results collected in a growing data record. This record can then be sorted and mined to identify lead candidates and establish critical structure-property limits within a given chemical design space. To-date, only a small number of studies have been reported in which quantum chemical calculations are used in a high-throughput fashion to compute properties and screen for optimal materials solutions. However, with time, high-throughput computational screening will become central to advanced materials research.
In this presentation, the use of high-throughput quantum chemistry to analyze and screen a materials structure library is demonstrated for Li-Ion battery additives based on ethylene carbonate (EC).
Materials science applications of HPC can help address challenges in developing new materials by enabling high-throughput screening of thousands of potential candidates using computational modeling. This reduces the time and cost of research and development compared to experimental methods alone. Examples discussed include identifying catalysts for fuel cells and optimizing electrolytes for lithium ion batteries through automated quantum chemistry calculations on large supercomputers.
This document discusses CT reconstruction artifacts and scatter correction algorithms from RX Solutions. It provides an overview of RX Solutions' company and CT system portfolio. It then describes various types of CT artifacts like scatter artifacts, and different approaches to scatter correction including air gaps, anti-scatter grids, and a posteriori correction methods. The document focuses on RX Solutions' own method for scatter correction, which operates directly on projections without simulation or prior knowledge of acquisition settings or the sample. Examples demonstrate the effectiveness of RX Solutions' scatter correction at reducing artifacts in CT reconstruction.
This document discusses how data is increasingly dominating high performance computing workloads. It notes that while computing power doubles every two years, data storage and movement capabilities are not keeping pace. This is leading to a "data tsunami" as experiments and simulations generate terabytes of data per day. The document then summarizes Sun Microsystems' end-to-end infrastructure for data-centric HPC workflows, including their Lustre parallel storage system, unified storage, tape archives, high performance computing blades, and InfiniBand switches. It positions Sun as uniquely able to deliver an integrated solution from computation to long-term data retention to help users cope with the challenges posed by rapidly growing datasets.
Overview of the Exascale Additive Manufacturing Projectinside-BigData.com
The Exascale Additive Manufacturing (ExaAM) project aims to accelerate the adoption of additive manufacturing by enabling the fabrication of qualifiable metal parts with minimal trial and error. ExaAM will couple high-fidelity sub-grid simulations within a continuum process simulation to determine microstructure and properties at each time-step using local conditions. ExaAM involves multiple computational codes, including ALE3D, Diablo, Truchas, MEUMAPPS, and AMPE, which model different additive manufacturing physics across continuum, meso, and micro scales. The goal is to utilize exascale concurrency and locality to dynamically bridge scales through an adaptive, task-based approach.
The document summarizes a conference on revamping the audit approach using XBRL-tagged accounting equation data. It discusses modeling the audit using a "top-cycle" approach, developing a domain-specific language for auditing, and applying XBRL tagging to all phases of a new 5-phase audit process for continuous, real-time auditing and reporting. The conference brings together academics and practitioners to advance this new computational auditing approach using XBRL data processing and modeling.
Using reduced system models for vibration design and validation.
1) Equivalent models are used to reduce complexity while preserving key behaviors through homogenization and model updating.
2) Model reduction techniques like component mode synthesis represent the system with subspace bases to enable coupling of test and finite element models.
3) Energy coupling methods allow assembly of disjoint reduced component models through computation of interface energies.
Atomate: a tool for rapid high-throughput computing and materials discoveryAnubhav Jain
Atomate is a tool for automating materials simulations and high-throughput computations. It provides predefined workflows for common calculations like band structures, elastic tensors, and Raman spectra. Users can customize workflows and simulation parameters. FireWorks executes workflows on supercomputers and detects/recovers from failures. Data is stored in databases for analysis with tools like pymatgen. The goal is to make simulations easy and scalable by automating tedious steps and leveraging past work.
This document discusses using hybrid cloud and grid infrastructure for high-throughput computational science. It provides an overview of the Nimrod toolkit, which supports parameter sweeps, optimization, and workflows across distributed resources. A recent experiment used Nimrod to complete jobs faster on grid resources than Amazon EC2. It also outlines a potential strawman project called GEMAP to enable grid-enabled microscopy across the Pacific using remote microscopes, compute clusters, storage, and visualization portals.
Overview of the structured data science domain, OSS machine learning platforms and algorithms. Using the JPMML family of libraries to implement a unified, production-oriented workflow.
Model-Driven Physical-Design for Future Nanoscale ArchitecturesCiprian Teodorov
This document discusses model-driven physical design approaches for future nanoscale architectures. It proposes a generic physical design framework based on a common structural domain model. This model-based approach aims to maximize tool reuse across different nanoscale technologies. It also separates algorithmic and architectural concerns by modeling tools as model transformations. An example nanoscale architecture template called R2D NASIC is developed using this framework and evaluated. Results show improvements in density, performance and max throughput pipelines compared to a baseline. Overall, the model-driven approach seeks to provide a common vocabulary and design flow for tackling challenges in physical design for emerging nanotechnologies.
Colored petri nets theory and applicationsAbu Hussein
This document discusses colored Petri nets (CP-nets) and their applications. CP-nets combine Petri nets with programming languages to model systems involving concurrency, communication, and resource sharing. They allow for simulation and formal verification. The document provides examples of CP-net applications in various domains including protocols, software, hardware, control systems, and military systems. It also describes how CP-net models can be used to automatically generate code for system implementations.
Simulation Data Management using Aras and SharePointAras
This document describes Advatech Pacific's solution for NASA to manage simulation data and requirements for mission design. The solution uses Aras Innovator to implement a Simulation Bill of Materials (SBOM) for linking analysis models. It also integrates with SharePoint and links requirements to analyses. Tight and loose integration of CAD and analysis tools like SolidWorks, STK, and Thermal Desktop were demonstrated.
Ruleml2012 - A production rule-based framework for causal and epistemic reaso...RuleML
The document describes a production rule-based framework for causal and epistemic reasoning. The framework combines event calculus foundations with a discrete event calculus (DECKT) to perform epistemic reasoning about events, knowledge, and time. It uses a rule-based forward-chaining production system to implement the framework and enable online/offline reasoning about dynamic domains.
Chemical Databases and Open Chemistry on the DesktopMarcus Hanwell
The modern chemist has access to large databases containing both experimental and calculated data. The power of HPC resources continues to increase, with more practitioners having routine access to powerful computational chemistry tools. This places an increasingly high burden on users to assimilate these resources into their workflow in order to effectively utilize resources. The creation of an open, extensible application framework that puts computational tools, data, and domain specific knowledge at the fingertips of chemists is increasingly important. A data-centric approach to chemistry, storing all data in a searchable database, will empower users to efficiently collaborate, innovate, and push the frontiers of research. Providing an open, user-friendly and extensible application will open up new tools to experimental chemists, while providing computational chemists the ability to address greater challenges. Additionally, by distributing experimental and computational data across the research community, incorporating cheminformatics analytics techniques, and providing visual search for chemical structures, the workflow of both groups can be significantly improved. This requires suitable data formats for data exchange, and databases with appropriate APIs for querying, and uploading data in order to effectively share. This talk will discuss recent progress made in developing a suite of open chemistry applications on the desktop. The applications can query online databases, such as the NIH structure resolver service, download and manipulate structures, and prepare input files for standalone computational chemistry codes. Another application developed to submit jobs, monitor and retrieve results from HPC resources will also be shown, and a desktop chemistry database browser. The Quixote project aims to establish standards for data exchange in computational chemistry, along with data repositories for organizations. Establishing these standards is important to promote open, reproducible chemistry, and their integration into user-friendly desktop applications will promote their integration in the standard workflow of researchers.
Discovering new functional materials for clean energy and beyond using high-t...Anubhav Jain
- The research group develops computational methods and machine learning models to design new functional materials using high-throughput computing. This includes developing databases of materials properties, benchmarking machine learning algorithms, and applying natural language processing to materials design. Recent work also involves automating materials synthesis and characterization. The group maintains several open-source software packages that power their research.
The document discusses lessons learned from 4 years of simulating an innovative network processor, emphasizing the need to leverage computing resources through automated testing, intelligent test generation, and storing large numbers of test executions and results in databases to methodically close issues. It provides recommendations for planning verification efforts and creating balanced testing pipelines to efficiently debug problems and keep verification teams productive.
Insights and Lessons Learned Verifying the QoS Engine of a Network ProcessorDVClub
The document discusses lessons learned from four years of quantum simulation. It summarizes that using more CPUs for simulation, not more engineers, allows for hundreds of CPU years of simulation and thousands of automated tests. This requires investment in the testing environment to increase productivity and keep work manageable. The key takeaway is to leverage computing resources to drive chip tape-out with high confidence.
The document describes a project to generate synthetic sky catalogs for the Dark Energy Survey through large cosmological simulations. An automated workflow was developed using Apache Airavata to run the simulations across multiple XSEDE resources. The workflow involves running CAMB to generate power spectra, 2LPTic for initial conditions, and LGadget for evolution. It was able to run four full simulations using over 300,000 service units faster than manual submission. Lessons learned include differences in MPI libraries and job scripts on resources and time needed to migrate codes and data. Future goals include post-processing integration and moving to new XSEDE resources like Trestles and Stampede.
ScienceCloud: Collaborative Workflows in Biologics R&DBIOVIA
The life sciences industry has undergone dramatic changes and effective global collaboration has become a key success factor in this new age. BIOVIA is providing a hosted and comprehensive solution stack for externalized, collaborative research for pharma/biotech and CROs to address these new challenges. Recently we added the support for biologics data management and IP capture. In this talk we will present collaborative and comprehensive capabilities in antibody characterization and development: capabilities to analyze, annotate and predict developability as part of a framework that facilitates secure data sharing and collaboration.
1. The document discusses Discngine's Tibco Spotfire Pipeline Pilot connector, which allows graphs stored in Pipeline Pilot to be accessed and visualized in Spotfire.
2. It describes the architecture of the connector and how it executes Pipeline Pilot protocols to generate HTML pages for visualization in Spotfire.
3. Challenges in integrating the large Spotfire API and synchronizing client and server datasets are also discussed.
More Related Content
Similar to Data Pipelining and Workflow Management for Materials Science Applications
Materials science applications of HPC can help address challenges in developing new materials by enabling high-throughput screening of thousands of potential candidates using computational modeling. This reduces the time and cost of research and development compared to experimental methods alone. Examples discussed include identifying catalysts for fuel cells and optimizing electrolytes for lithium ion batteries through automated quantum chemistry calculations on large supercomputers.
This document discusses CT reconstruction artifacts and scatter correction algorithms from RX Solutions. It provides an overview of RX Solutions' company and CT system portfolio. It then describes various types of CT artifacts like scatter artifacts, and different approaches to scatter correction including air gaps, anti-scatter grids, and a posteriori correction methods. The document focuses on RX Solutions' own method for scatter correction, which operates directly on projections without simulation or prior knowledge of acquisition settings or the sample. Examples demonstrate the effectiveness of RX Solutions' scatter correction at reducing artifacts in CT reconstruction.
This document discusses how data is increasingly dominating high performance computing workloads. It notes that while computing power doubles every two years, data storage and movement capabilities are not keeping pace. This is leading to a "data tsunami" as experiments and simulations generate terabytes of data per day. The document then summarizes Sun Microsystems' end-to-end infrastructure for data-centric HPC workflows, including their Lustre parallel storage system, unified storage, tape archives, high performance computing blades, and InfiniBand switches. It positions Sun as uniquely able to deliver an integrated solution from computation to long-term data retention to help users cope with the challenges posed by rapidly growing datasets.
Overview of the Exascale Additive Manufacturing Projectinside-BigData.com
The Exascale Additive Manufacturing (ExaAM) project aims to accelerate the adoption of additive manufacturing by enabling the fabrication of qualifiable metal parts with minimal trial and error. ExaAM will couple high-fidelity sub-grid simulations within a continuum process simulation to determine microstructure and properties at each time-step using local conditions. ExaAM involves multiple computational codes, including ALE3D, Diablo, Truchas, MEUMAPPS, and AMPE, which model different additive manufacturing physics across continuum, meso, and micro scales. The goal is to utilize exascale concurrency and locality to dynamically bridge scales through an adaptive, task-based approach.
The document summarizes a conference on revamping the audit approach using XBRL-tagged accounting equation data. It discusses modeling the audit using a "top-cycle" approach, developing a domain-specific language for auditing, and applying XBRL tagging to all phases of a new 5-phase audit process for continuous, real-time auditing and reporting. The conference brings together academics and practitioners to advance this new computational auditing approach using XBRL data processing and modeling.
Using reduced system models for vibration design and validation.
1) Equivalent models are used to reduce complexity while preserving key behaviors through homogenization and model updating.
2) Model reduction techniques like component mode synthesis represent the system with subspace bases to enable coupling of test and finite element models.
3) Energy coupling methods allow assembly of disjoint reduced component models through computation of interface energies.
Atomate: a tool for rapid high-throughput computing and materials discoveryAnubhav Jain
Atomate is a tool for automating materials simulations and high-throughput computations. It provides predefined workflows for common calculations like band structures, elastic tensors, and Raman spectra. Users can customize workflows and simulation parameters. FireWorks executes workflows on supercomputers and detects/recovers from failures. Data is stored in databases for analysis with tools like pymatgen. The goal is to make simulations easy and scalable by automating tedious steps and leveraging past work.
This document discusses using hybrid cloud and grid infrastructure for high-throughput computational science. It provides an overview of the Nimrod toolkit, which supports parameter sweeps, optimization, and workflows across distributed resources. A recent experiment used Nimrod to complete jobs faster on grid resources than Amazon EC2. It also outlines a potential strawman project called GEMAP to enable grid-enabled microscopy across the Pacific using remote microscopes, compute clusters, storage, and visualization portals.
Overview of the structured data science domain, OSS machine learning platforms and algorithms. Using the JPMML family of libraries to implement a unified, production-oriented workflow.
Model-Driven Physical-Design for Future Nanoscale ArchitecturesCiprian Teodorov
This document discusses model-driven physical design approaches for future nanoscale architectures. It proposes a generic physical design framework based on a common structural domain model. This model-based approach aims to maximize tool reuse across different nanoscale technologies. It also separates algorithmic and architectural concerns by modeling tools as model transformations. An example nanoscale architecture template called R2D NASIC is developed using this framework and evaluated. Results show improvements in density, performance and max throughput pipelines compared to a baseline. Overall, the model-driven approach seeks to provide a common vocabulary and design flow for tackling challenges in physical design for emerging nanotechnologies.
Colored petri nets theory and applicationsAbu Hussein
This document discusses colored Petri nets (CP-nets) and their applications. CP-nets combine Petri nets with programming languages to model systems involving concurrency, communication, and resource sharing. They allow for simulation and formal verification. The document provides examples of CP-net applications in various domains including protocols, software, hardware, control systems, and military systems. It also describes how CP-net models can be used to automatically generate code for system implementations.
Simulation Data Management using Aras and SharePointAras
This document describes Advatech Pacific's solution for NASA to manage simulation data and requirements for mission design. The solution uses Aras Innovator to implement a Simulation Bill of Materials (SBOM) for linking analysis models. It also integrates with SharePoint and links requirements to analyses. Tight and loose integration of CAD and analysis tools like SolidWorks, STK, and Thermal Desktop were demonstrated.
Ruleml2012 - A production rule-based framework for causal and epistemic reaso...RuleML
The document describes a production rule-based framework for causal and epistemic reasoning. The framework combines event calculus foundations with a discrete event calculus (DECKT) to perform epistemic reasoning about events, knowledge, and time. It uses a rule-based forward-chaining production system to implement the framework and enable online/offline reasoning about dynamic domains.
Chemical Databases and Open Chemistry on the DesktopMarcus Hanwell
The modern chemist has access to large databases containing both experimental and calculated data. The power of HPC resources continues to increase, with more practitioners having routine access to powerful computational chemistry tools. This places an increasingly high burden on users to assimilate these resources into their workflow in order to effectively utilize resources. The creation of an open, extensible application framework that puts computational tools, data, and domain specific knowledge at the fingertips of chemists is increasingly important. A data-centric approach to chemistry, storing all data in a searchable database, will empower users to efficiently collaborate, innovate, and push the frontiers of research. Providing an open, user-friendly and extensible application will open up new tools to experimental chemists, while providing computational chemists the ability to address greater challenges. Additionally, by distributing experimental and computational data across the research community, incorporating cheminformatics analytics techniques, and providing visual search for chemical structures, the workflow of both groups can be significantly improved. This requires suitable data formats for data exchange, and databases with appropriate APIs for querying, and uploading data in order to effectively share. This talk will discuss recent progress made in developing a suite of open chemistry applications on the desktop. The applications can query online databases, such as the NIH structure resolver service, download and manipulate structures, and prepare input files for standalone computational chemistry codes. Another application developed to submit jobs, monitor and retrieve results from HPC resources will also be shown, and a desktop chemistry database browser. The Quixote project aims to establish standards for data exchange in computational chemistry, along with data repositories for organizations. Establishing these standards is important to promote open, reproducible chemistry, and their integration into user-friendly desktop applications will promote their integration in the standard workflow of researchers.
Discovering new functional materials for clean energy and beyond using high-t...Anubhav Jain
- The research group develops computational methods and machine learning models to design new functional materials using high-throughput computing. This includes developing databases of materials properties, benchmarking machine learning algorithms, and applying natural language processing to materials design. Recent work also involves automating materials synthesis and characterization. The group maintains several open-source software packages that power their research.
The document discusses lessons learned from 4 years of simulating an innovative network processor, emphasizing the need to leverage computing resources through automated testing, intelligent test generation, and storing large numbers of test executions and results in databases to methodically close issues. It provides recommendations for planning verification efforts and creating balanced testing pipelines to efficiently debug problems and keep verification teams productive.
Insights and Lessons Learned Verifying the QoS Engine of a Network ProcessorDVClub
The document discusses lessons learned from four years of quantum simulation. It summarizes that using more CPUs for simulation, not more engineers, allows for hundreds of CPU years of simulation and thousands of automated tests. This requires investment in the testing environment to increase productivity and keep work manageable. The key takeaway is to leverage computing resources to drive chip tape-out with high confidence.
The document describes a project to generate synthetic sky catalogs for the Dark Energy Survey through large cosmological simulations. An automated workflow was developed using Apache Airavata to run the simulations across multiple XSEDE resources. The workflow involves running CAMB to generate power spectra, 2LPTic for initial conditions, and LGadget for evolution. It was able to run four full simulations using over 300,000 service units faster than manual submission. Lessons learned include differences in MPI libraries and job scripts on resources and time needed to migrate codes and data. Future goals include post-processing integration and moving to new XSEDE resources like Trestles and Stampede.
Similar to Data Pipelining and Workflow Management for Materials Science Applications (20)
ScienceCloud: Collaborative Workflows in Biologics R&DBIOVIA
The life sciences industry has undergone dramatic changes and effective global collaboration has become a key success factor in this new age. BIOVIA is providing a hosted and comprehensive solution stack for externalized, collaborative research for pharma/biotech and CROs to address these new challenges. Recently we added the support for biologics data management and IP capture. In this talk we will present collaborative and comprehensive capabilities in antibody characterization and development: capabilities to analyze, annotate and predict developability as part of a framework that facilitates secure data sharing and collaboration.
1. The document discusses Discngine's Tibco Spotfire Pipeline Pilot connector, which allows graphs stored in Pipeline Pilot to be accessed and visualized in Spotfire.
2. It describes the architecture of the connector and how it executes Pipeline Pilot protocols to generate HTML pages for visualization in Spotfire.
3. Challenges in integrating the large Spotfire API and synchronizing client and server datasets are also discussed.
(ATS6-PLAT09) Deploying Applications on load balanced AEP servers for high av...BIOVIA
This document discusses deploying Accelrys Enterprise Platform (AEP) servers in a load balanced configuration for high availability. It recommends using a staging server to test configurations before deploying to production nodes. All nodes should be configured identically and share storage. A load balancer should be configured to distribute traffic evenly across nodes. Applications need to be packaged and deployed identically to each node to ensure consistency across the load balanced farm. Load balancing improves availability, scalability and performance but requires additional infrastructure and configuration.
(ATS6-PLAT07) Managing AEP in an enterprise environmentBIOVIA
Deployments can range from personal laptop usage to large enterprise environments. The installer allows both interactive and unattended installations. Key folders include Users for individual data, Jobs for temporary execution data, Shared Public for shared resources, and XMLDB for the database. Logs record job executions, authentication events, and errors. Tools like DbUtil allow backup/restore of data, pkgutil creates packages for application delivery, and regress enables test automation. Planning folder locations and maintenance is important for managing resources in an enterprise environment.
(ATS6-PLAT05) Security enhancements in AEP 9BIOVIA
In the latest version of the Accelrys Enterprise Platform we have streamlined how permissions are managed and added the capability for packages to define groups and permission sets. In addition, enhancements have been made to File Based Authentication, we have added support for enterprise authentication solutions like Kerberos and SAML and improved the usability of the Administration Portal. This session describes the new features and how to manage them through the Administration Portal.
The Query Service is the new platform solution for querying a variety of data sources. The goal of Query Service is that administrators can configure a metadata description of the data source that can then be used by end users without detailed knowledge of the underlying data source. This session explains how to configure Query Service data sources and use them with the RESTful API or component collection.
(ATS6-PLAT02) Accelrys Catalog and Protocol ValidationBIOVIA
Accelrys Catalog is a powerful new technology for creating an index of the protocols and components within your organization. You will learn about strategies for indexing and how search capabilities can be deployed to professional client and Web Port end users. You will also learn how to use this technology to find out about system usage to aid with system upgrades, server consolidations, and general system maintenance. The protocol validation capability in the admin portal allows administrators to created standard reports on server usage characteristics. You will learn how to report on violations of IT policies (e.g. around security), bad protocol authoring practices, or missing or incomplete protocol documentation. Developers will also learn how to extend and customize the rules used to create these reports.
(ATS6-PLAT01) Chemistry Harmonization: Bringing together the Direct 9 and Pip...BIOVIA
Pipeline Pilot Chemistry 9.0 is inheriting many new chemical representations from the Accelrys Direct data model. These include the support of the Self Contained Sequence Representation (SCSR) biologics, enhanced Markush structure representations, Markush homology groups, and Non Specific Structures (NONS). Also significantly enhanced is the support for Sgroups, in particular for polymers, mixtures, and formulations. Further, Pipeline Pilot depiction has been upgraded to support these enhancements and the stereochemical perception and ring perception capabilities were improved based on Direct.
The major benefit of these changes is that Direct and Pipeline Pilot now use the same data model. Searches carried out in Direct or in Pipeline Pilot will return identical results and both products will deliver identical structural perceptions. This session will give guidance on how these changes will impact your calculators and models and how you can plan for a smooth upgrade.
(ATS6-GS04) Performance Analysis of Accelrys Enterprise Platform 9.0 on IBM’s...BIOVIA
IBM recently completed a benchmarking study of several key modules of the Accelrys Enterprise Platform (AEP) 9.0, using IBM’s iDataPlex and General Parallel File System (GPFS). The results show that the performance of IO intensive workloads, such as Next Generation Sequencing (NGS), can be improved significantly by using GPFS. NGS workloads can also benefit from better load balancing implemented on AEP 9.0. Best practices for scalable IT solutions will also be discussed.
This document outlines an integration between the Contur and HEOS software. The integration is focused on allowing scientists to record experimental data in Contur and register compounds to HEOS as part of their workflow. It describes the Contur REST API and protocol execution framework that can extract and create Contur content. It also describes the HEOS SOAP API that can extract and create content in HEOS, including registering compounds. Components and protocols are provided that use these APIs to facilitate transferring data directly from Contur experiments to HEOS compound registration, without needing to re-enter information, in order to save time and reduce errors.
This document contains an agenda for a two-day Accelrys software development event with over 50 registered attendees from partner companies like BT, Discngine, and IBM. On day one, there are sessions on new features and improvements to various Accelrys products, like Direct and PPChem. There are also sessions on deploying products like Discoverant and using collections. Day two focuses on roadmaps for products like LIMS and ELN. Additional sessions discuss maximizing performance, deployment strategies, and integration. The event aims to provide information to help attendees improve their ability to use Accelrys products.
(ATS6-DEV09) Deep Dive into REST and SOAP Integration for Protocol AuthorsBIOVIA
Pipeline Pilot has always had a strong focus on integration to external resources. In AEP 9.0 we continue this tradition with a major overhaul of our SOAP Connector component as well as improved support for RESTful services. In this talk we will look at how to build protocols that access SOAP services especially secured services and review the approach to accessing RESTful services.
(ATS6-DEV08) Integrating Contur ELN with other systems using a RESTful APIBIOVIA
In order to enable easy integration between Contur ELN and other informatics systems a RESTful API has been developed. Data may be extracted from ELN experiments using GET calls, but external applications can also insert results directly into the ELN record. In particular the API can be used with Accelrys Enterprise Platform to create complex flows for resolving scientific problems. Such protocols may be launched from within the ELN client.
(ATS6-DEV07) Building widgets for ELN home pageBIOVIA
From a developer’s perspective, the Accelrys ELN Home Page is a container of widgets. It manages the layout of widgets, and handles the persistence of their settings. Several widgets are provided with the application: one for creating new experiments, another for tracking work in progress, and an inbox widget for messages sent through the notebook. This out-of-the-box set can be supplemented by building custom widgets.
This session will show several custom widgets examples to demonstrate the basic concepts of widget development and the API they implement. We will also discuss best practices, and how to make your widget a good citizen of the Home Page.
(ATS6-DEV06) Using Packages for Protocol, Component, and Application DeliveryBIOVIA
Delivering protocols, components, and applications to users and other developers on an AEP server can be very challenging. Accelrys delivers the majority of its AEP services in the form of packages. This talk will discuss the methods that anyone can use to deliver bundled applications in the form of packages and the benefits of doing so. The discussion will include how to create packages, modifying existing packages, deploying packages to servers, and tools that can be used for ensuring the quality of the packages.
(ATS6-DEV05) Building Interactive Web Applications with the Reporting CollectionBIOVIA
The document discusses building interactive web applications using the Reporting Collection. It describes components like forms, data connectors, interactive elements and AJAX capabilities that allow adding interactivity. The reporting components generate reports in formats like HTML, PDF from data and layouts. Interactive components allow generating full web applications without additional coding. Forms capture user input. The data connector synchronizes selections across visualizations. Protocol links and functions enable drill-down and AJAX functionality. JavaScript attributes and components add custom scripting.
(ATS6-DEV04) Building Web MashUp applications that include Accelrys Applicati...BIOVIA
One of the biggest challenges in most corporate environments is providing a way for users to access all the data they need, usually stored in multiple disparate locations, from one interface that they are comfortable with. As web applications have become more popular, RESTful APIs have emerged as the preffered web service format in recent years. Many Accelrys applications now provide RESTful APIs that allow developers to build mashup applications. This session will explore some of these APIs and how to use them to build a simple application.
(ATS6-DEV03) Building an Enterprise Web Solution with AEPBIOVIA
In this session, we'll take a deep dive into building an Enterprise Solution with AEP. We'll be using Pipeline Pilot to develop the protocols that will provide our server-side implementations and ExtJS to build the user interface. We'll look at the techniques involved in using protocols to implement actions and explore the capabilities of ExtJS to produce powerful enterprise applications.
This document discusses different strategies for building web applications using the Accelrys Enterprise Platform (AEP). It outlines three main strategies: Form & Result, Dashboard, and Enterprise Application.
Form & Result is best for simple applications that focus on running protocols and displaying results. Dashboard adds interactivity with JavaScript and the Data Connector. Enterprise Application employs a third-party JavaScript library to build a fully customized user interface, separate from AEP.
The document provides examples and discusses the technologies involved in each approach. It recommends choosing based on requirements complexity, development time, and skill sets, noting that Form & Result is fastest but least customizable, while Enterprise Application is most complex but powerful.
(ATS6-DEV01) What’s new for Protocol and Component Developers in AEP 9.0BIOVIA
This document summarizes new features in Accelrys AEP 9.0 including improvements to protocol database searching, protocol comparison, protocol linking via URLs, parameter initialization, promotion and metadata, autosave functionality, Pilotscript hashmaps, XML parsing, and Unicode support. It encourages providing feedback to help further develop the platform.
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Speck&Tech
ABSTRACT: A prima vista, un mattoncino Lego e la backdoor XZ potrebbero avere in comune il fatto di essere entrambi blocchi di costruzione, o dipendenze di progetti creativi e software. La realtà è che un mattoncino Lego e il caso della backdoor XZ hanno molto di più di tutto ciò in comune.
Partecipate alla presentazione per immergervi in una storia di interoperabilità, standard e formati aperti, per poi discutere del ruolo importante che i contributori hanno in una comunità open source sostenibile.
BIO: Sostenitrice del software libero e dei formati standard e aperti. È stata un membro attivo dei progetti Fedora e openSUSE e ha co-fondato l'Associazione LibreItalia dove è stata coinvolta in diversi eventi, migrazioni e formazione relativi a LibreOffice. In precedenza ha lavorato a migrazioni e corsi di formazione su LibreOffice per diverse amministrazioni pubbliche e privati. Da gennaio 2020 lavora in SUSE come Software Release Engineer per Uyuni e SUSE Manager e quando non segue la sua passione per i computer e per Geeko coltiva la sua curiosità per l'astronomia (da cui deriva il suo nickname deneb_alpha).
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
Fueling AI with Great Data with Airbyte WebinarZilliz
This talk will focus on how to collect data from a variety of sources, leveraging this data for RAG and other GenAI use cases, and finally charting your course to productionalization.
Generating privacy-protected synthetic data using Secludy and MilvusZilliz
During this demo, the founders of Secludy will demonstrate how their system utilizes Milvus to store and manipulate embeddings for generating privacy-protected synthetic data. Their approach not only maintains the confidentiality of the original data but also enhances the utility and scalability of LLMs under privacy constraints. Attendees, including machine learning engineers, data scientists, and data managers, will witness first-hand how Secludy's integration with Milvus empowers organizations to harness the power of LLMs securely and efficiently.
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slackshyamraj55
Discover the seamless integration of RPA (Robotic Process Automation), COMPOSER, and APM with AWS IDP enhanced with Slack notifications. Explore how these technologies converge to streamline workflows, optimize performance, and ensure secure access, all while leveraging the power of AWS IDP and real-time communication via Slack notifications.
Building Production Ready Search Pipelines with Spark and MilvusZilliz
Spark is the widely used ETL tool for processing, indexing and ingesting data to serving stack for search. Milvus is the production-ready open-source vector database. In this talk we will show how to use Spark to process unstructured data to extract vector representations, and push the vectors to Milvus vector database for search serving.
Ivanti’s Patch Tuesday breakdown goes beyond patching your applications and brings you the intelligence and guidance needed to prioritize where to focus your attention first. Catch early analysis on our Ivanti blog, then join industry expert Chris Goettl for the Patch Tuesday Webinar Event. There we’ll do a deep dive into each of the bulletins and give guidance on the risks associated with the newly-identified vulnerabilities.
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/
DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind für viele in der HCL-Community seit letztem Jahr ein heißes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und Lizenzgebühren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen!
Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lösen können, die dazu führen können, dass mehr Benutzer gezählt werden als nötig, und wie Sie überflüssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnötigen Ausgaben führen können, z. B. wenn ein Personendokument anstelle eines Mail-Ins für geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren Lösungen. Und natürlich erklären wir Ihnen das neue Lizenzmodell.
Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Überblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten.
Diese Themen werden behandelt
- Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und überflüssigen Konten
- Wie funktionieren CCB- und CCX-Lizenzen wirklich?
- Verstehen des DLAU-Tools und wie man es am besten nutzt
- Tipps für häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw.
- Praxisbeispiele und Best Practices zum sofortigen Umsetzen
Driving Business Innovation: Latest Generative AI Advancements & Success StorySafe Software
Are you ready to revolutionize how you handle data? Join us for a webinar where we’ll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency.
During the hour, we’ll take you through:
Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sources—from PDF floorplans to web pages—using FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board.
Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes.
Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether it’s populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI.
We’ll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI.
This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!
Have you ever been confused by the myriad of choices offered by AWS for hosting a website or an API?
Lambda, Elastic Beanstalk, Lightsail, Amplify, S3 (and more!) can each host websites + APIs. But which one should we choose?
Which one is cheapest? Which one is fastest? Which one will scale to meet our needs?
Join me in this session as we dive into each AWS hosting service to determine which one is best for your scenario and explain why!
OpenID AuthZEN Interop Read Out - AuthorizationDavid Brossard
During Identiverse 2024 and EIC 2024, members of the OpenID AuthZEN WG got together and demoed their authorization endpoints conforming to the AuthZEN API
GraphRAG for Life Science to increase LLM accuracyTomaz Bratanic
GraphRAG for life science domain, where you retriever information from biomedical knowledge graphs using LLMs to increase the accuracy and performance of generated answers
Monitoring and Managing Anomaly Detection on OpenShift.pdfTosin Akinosho
Monitoring and Managing Anomaly Detection on OpenShift
Overview
Dive into the world of anomaly detection on edge devices with our comprehensive hands-on tutorial. This SlideShare presentation will guide you through the entire process, from data collection and model training to edge deployment and real-time monitoring. Perfect for those looking to implement robust anomaly detection systems on resource-constrained IoT/edge devices.
Key Topics Covered
1. Introduction to Anomaly Detection
- Understand the fundamentals of anomaly detection and its importance in identifying unusual behavior or failures in systems.
2. Understanding Edge (IoT)
- Learn about edge computing and IoT, and how they enable real-time data processing and decision-making at the source.
3. What is ArgoCD?
- Discover ArgoCD, a declarative, GitOps continuous delivery tool for Kubernetes, and its role in deploying applications on edge devices.
4. Deployment Using ArgoCD for Edge Devices
- Step-by-step guide on deploying anomaly detection models on edge devices using ArgoCD.
5. Introduction to Apache Kafka and S3
- Explore Apache Kafka for real-time data streaming and Amazon S3 for scalable storage solutions.
6. Viewing Kafka Messages in the Data Lake
- Learn how to view and analyze Kafka messages stored in a data lake for better insights.
7. What is Prometheus?
- Get to know Prometheus, an open-source monitoring and alerting toolkit, and its application in monitoring edge devices.
8. Monitoring Application Metrics with Prometheus
- Detailed instructions on setting up Prometheus to monitor the performance and health of your anomaly detection system.
9. What is Camel K?
- Introduction to Camel K, a lightweight integration framework built on Apache Camel, designed for Kubernetes.
10. Configuring Camel K Integrations for Data Pipelines
- Learn how to configure Camel K for seamless data pipeline integrations in your anomaly detection workflow.
11. What is a Jupyter Notebook?
- Overview of Jupyter Notebooks, an open-source web application for creating and sharing documents with live code, equations, visualizations, and narrative text.
12. Jupyter Notebooks with Code Examples
- Hands-on examples and code snippets in Jupyter Notebooks to help you implement and test anomaly detection models.
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
How to Get CNIC Information System with Paksim Ga.pptx
Data Pipelining and Workflow Management for Materials Science Applications
1. Data Pipelining and Workflow
Management for Materials
Science Applications
Dr George Fitzgerald
Dr Mathew Halls
Dr Jacob Gavartin
Dr Gerhard Goldbeck-Wood
Accelrys, Inc.