The Open Chemistry project is developing an ambitious platform to facilitate reproducible quantum chemistry workflows by integrating the best of breed open source projects currently available in a cohesive platform with extensions specific to the needs of quantum chemistry. The core of the project is a Python-based data server capable of storing metadata, executing quantum chemistry calculations, and processing the output. The platform exposes RESTful endpoints using programming language agnostic web endpoints, and uses Linux container technology to package quantum codes that are often difficult to build.
The Jupyter project has been leveraged as a web-based frontend offering reproducibility as a core principle. This has been coupled with the data server to initiate quantum chemistry calculations, cache results, make them searchable, and even visualize the results within a modern browser environment. The Avogadro libraries have been reused for visualization workflows, coupled with Open Babel for file translation, and examples of the use of NWChem and Psi4 will be demonstrated.
The core of the platform is developed upon JSON data standards, and encouraging the wider adoption of JSON/HDF5 as the principle storage mediums. A single page web application using React at its core will be shown for sharing simple views of data output, and linking to the Jupyter notebooks that documents how they were made. Command line tools and links to the Avogadro graphical interface will be shown demonstrating capabilities from web through to desktop.
Presented at the first Avogadro User Meeting, and presents an overview of the history of Avogadro development. It discusses changes in the rewrite, and the broader Open Chemistry project.
Avogadro is being rewritten and architected to put semantic chemical meaning at the center of its internal data structures in order to fully support data-centric workflows. Computational and experimental chemistry both suffer when semantic meaning is lost; through the use of expressive formats such as CML, along with lightweight data-exchange formats such as JSON, workflows that previously demanded manual intervention to retain semantic meaning can be used. Integration with projects like JUMBO and Open Babel when conversion is required, coupled with codes such as NWChem where direct support for CML is being added, allow for much richer storage, analysis, and indexing of data. As web-based data sources add more semantic structure to their data, Avogadro will take advantage of those resources.
Open Chemistry: Realizing Open Data, Open Standards, and Open SourceMarcus Hanwell
The Blue Obelisk has brought together the computational chemistry community and those who are passionate about Open Chemistry and realizing the promise of Open Data, Open Standards, and Open Software (ODOSOS); the three pillars the group promotes. We will present current work that has taken place over the past five years, which is inspired by these pillars, and present plans for future work.
The group is actively engaged in multiple open source projects that rely on and promote open standards and open data including: Avogadro (a powerful 3D molecular editor), OpenQube (a library for quantum mechanics), ChemData (a tool for large-scale chemical data analysis and visualization), Chemkit (a library for cheminformatics), MoleQueue (a HPC queue manager), and VTK (a library for scientific data visualization). The Open Chemistry project benefits greatly from the activities of the Blue Obelisk and makes use of several prominent open-source projects including Qt and MongoDB.
Slides of the paper OCR-D: An end-to-end open-source OCR framework for historical documents by Clemens Neudecker, Konstantin Baierer, Maria Federbusch, Kay-Michael Würzner, Matthias Boenig, Elisa Hermann and Volker Hartmann at the 3rd Edition of the DATeCH2019 International Conference
Presented at the first Avogadro User Meeting, and presents an overview of the history of Avogadro development. It discusses changes in the rewrite, and the broader Open Chemistry project.
Avogadro is being rewritten and architected to put semantic chemical meaning at the center of its internal data structures in order to fully support data-centric workflows. Computational and experimental chemistry both suffer when semantic meaning is lost; through the use of expressive formats such as CML, along with lightweight data-exchange formats such as JSON, workflows that previously demanded manual intervention to retain semantic meaning can be used. Integration with projects like JUMBO and Open Babel when conversion is required, coupled with codes such as NWChem where direct support for CML is being added, allow for much richer storage, analysis, and indexing of data. As web-based data sources add more semantic structure to their data, Avogadro will take advantage of those resources.
Open Chemistry: Realizing Open Data, Open Standards, and Open SourceMarcus Hanwell
The Blue Obelisk has brought together the computational chemistry community and those who are passionate about Open Chemistry and realizing the promise of Open Data, Open Standards, and Open Software (ODOSOS); the three pillars the group promotes. We will present current work that has taken place over the past five years, which is inspired by these pillars, and present plans for future work.
The group is actively engaged in multiple open source projects that rely on and promote open standards and open data including: Avogadro (a powerful 3D molecular editor), OpenQube (a library for quantum mechanics), ChemData (a tool for large-scale chemical data analysis and visualization), Chemkit (a library for cheminformatics), MoleQueue (a HPC queue manager), and VTK (a library for scientific data visualization). The Open Chemistry project benefits greatly from the activities of the Blue Obelisk and makes use of several prominent open-source projects including Qt and MongoDB.
Slides of the paper OCR-D: An end-to-end open-source OCR framework for historical documents by Clemens Neudecker, Konstantin Baierer, Maria Federbusch, Kay-Michael Würzner, Matthias Boenig, Elisa Hermann and Volker Hartmann at the 3rd Edition of the DATeCH2019 International Conference
PyCon Poland 2016: Maintaining a high load Python project: typical mistakesViach Kakovskyi
The talk is about typical mistakes which a Python developer without much experience in high load systems can make. Possible issues and preventive actions will be discussed. Expected audience: developers who are new to an existing highly loaded service or folks who develop a system from scratch. All the stuff based on own production experience.
Avogadro: Open Source Libraries and Application for Computational ChemistryMarcus Hanwell
In order to tackle upcoming molecular simulation and visualization challenges in key areas of materials science, chemistry and biology it is necessary to move beyond fixed software applications. The Avogadro project is in the final stages of an ambitious rewrite of its core data structures, algorithms and visualization capabilities. The project began as a grass roots effort to address deficiencies observed by many of the early contributors in existing commercial and open source solutions. Avogadro is now a robust, flexible solution that can tie in to and harness the power of VTK for additional analysis and visualization capabilities.
Austin Python Meetup 2017: How to Stop Worrying and Start a Project with Pyth...Viach Kakovskyi
The talk "How to stop worrying and start a project with Python 3 " based on my production experience of using the technology. Typical fears of the engineers which use Python 2 are addressed.
PrefetchML: a Framework for Prefetching and Caching ModelsGwendal Daniel
PrefetchML Presentation at MoDELS'16. Related article available online at https://hal.archives-ouvertes.fr/hal-01362149/document
Related post on modeling-languages.com: http://modeling-languages.com/prefetchml-dsl-prefetching-caching-emf-models/
An overview of the infrastructure the Linaro LMG team is using for the ART development, describing some of the interaction between Gerrit, Jenkins and Lava, the differences between target and host tests as well as a high level overview of most of the Linaro ART Jenkins tests. The presentation is aimed at providing a better understanding of our infrastructure to any ART new starter as well as to anyone that is interesting in putting together a similar infrastructure for any other software project.
Goal of the session is to show ways of identifying badly written code in long term perspective. As an example a OSS e-commerce platform was examined and the results will be discussed during session. I will also show waht we, as developers, should pay attention while doing out daily programming routines. Both programmers and other team members will be able to identify committed code crimes :)
How to build processes that enable processing both big and small amounts of data by people with different backgrouns, using the same tools - Jupyter and Spark
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...StampedeCon
This session will be a detailed recount of the design, implementation, and launch of the next-generation Shutterstock Data Platform, with strong emphasis on conveying clear, understandable learnings that can be transferred to your own organizations and projects. This platform was architected around the prevailing use of Kafka as a highly-scalable central data hub for shipping data across your organization in batch or streaming fashion. It also relies heavily on Avro as a serialization format and a global schema registry to provide structure that greatly improves quality and usability of our data sets, while also allowing the flexibility to evolve schemas and maintain backwards compatibility.
As a company, Shutterstock has always focused heavily on leveraging open source technologies in developing its products and infrastructure, and open source has been a driving force in big data more so than almost any other software sub-sector. With this plethora of constantly evolving data technologies, it can be a daunting task to select the right tool for your problem. We will discuss our approach for choosing specific existing technologies and when we made decisions to invest time in home-grown components and solutions.
We will cover advantages and the engineering process of developing language-agnostic APIs for publishing to and consuming from the data platform. These APIs can power some very interesting streaming analytics solutions that are easily accessible to teams across our engineering organization.
We will also discuss some of the massive advantages a global schema for your data provides for downstream ETL and data analytics. ETL into Hadoop and creation and maintenance of Hive databases and tables becomes much more reliable and easily automated with historically compatible schemas. To complement this schema-based approach, we will cover results of performance testing various file formats and compression schemes in Hadoop and Hive, the massive performance benefits you can gain in analytical workloads by leveraging highly optimized columnar file formats such as ORC and Parquet, and how you can use good old fashioned Hive as a tool for easily and efficiently converting exiting datasets into these formats.
Finally, we will cover lessons learned in launching this platform across our organization, future improvements and further design, and the need for data engineers to understand and speak the languages of data scientists and web, infrastructure, and network engineers.
WebCamp 2016: Python. Вячеслав Каковский: Real-time мессенджер на Python. Осо...WebCamp
Доклад построен на опыте разработки платформы реал-тайм мессенджера с характеристиками:
* 100 000+ одновременно подключенных пользователей
* 100+ серверов
* REST API для ботов
Структура доклада:
* Зачем разрабатывать мессенджер?
* Актуальные протоколы обмена сообщениями
* Архитектурные подходы к разработке мессенджера
* Библиотеки и инструменты
* Проблемы и подводные камни
The world has changed and having one huge server won’t do the job anymore, when you’re talking about vast amounts of data, growing all the time the ability to Scale Out would be your savior. Apache Spark is a fast and general engine for big data processing, with built-in modules for streaming, SQL, machine learning and graph processing.
This lecture will be about the basics of Apache Spark and distributed computing and the development tools needed to have a functional environment.
Spring Data Neo4j: Graph Power Your Enterprise AppsGraphAware
A few weeks ago Spring Data Neo4j version 5 was released as part of the Spring Data 2.0 release train. Time to present the Spring way to work with Neo4j and introduce the latest features SDN 5 and its supporting library Neo4j-OGM 3 provide. The talk will also give an overview of the overall architecture and shows examples how to build modern, compact back-ends and web-applications using Spring Data Neo4j. Of course we will give a glance of what the future will bring to Spring Data Neo4j.
WebCamp Ukraine 2016: Instant messenger with Python. Back-end developmentViach Kakovskyi
Lessons learnt by the development of Instant Messenger with Python 2 and Twisted.
About the project:
* 100 000+ connected users
* 100+ nodes
* REST API for integration
Expected audience: developers who are new to development of Instant Messenger or folks who develop a system from scratch.
All the stuff based on own production experience.
PyCon Poland 2016: Maintaining a high load Python project: typical mistakesViach Kakovskyi
The talk is about typical mistakes which a Python developer without much experience in high load systems can make. Possible issues and preventive actions will be discussed. Expected audience: developers who are new to an existing highly loaded service or folks who develop a system from scratch. All the stuff based on own production experience.
Avogadro: Open Source Libraries and Application for Computational ChemistryMarcus Hanwell
In order to tackle upcoming molecular simulation and visualization challenges in key areas of materials science, chemistry and biology it is necessary to move beyond fixed software applications. The Avogadro project is in the final stages of an ambitious rewrite of its core data structures, algorithms and visualization capabilities. The project began as a grass roots effort to address deficiencies observed by many of the early contributors in existing commercial and open source solutions. Avogadro is now a robust, flexible solution that can tie in to and harness the power of VTK for additional analysis and visualization capabilities.
Austin Python Meetup 2017: How to Stop Worrying and Start a Project with Pyth...Viach Kakovskyi
The talk "How to stop worrying and start a project with Python 3 " based on my production experience of using the technology. Typical fears of the engineers which use Python 2 are addressed.
PrefetchML: a Framework for Prefetching and Caching ModelsGwendal Daniel
PrefetchML Presentation at MoDELS'16. Related article available online at https://hal.archives-ouvertes.fr/hal-01362149/document
Related post on modeling-languages.com: http://modeling-languages.com/prefetchml-dsl-prefetching-caching-emf-models/
An overview of the infrastructure the Linaro LMG team is using for the ART development, describing some of the interaction between Gerrit, Jenkins and Lava, the differences between target and host tests as well as a high level overview of most of the Linaro ART Jenkins tests. The presentation is aimed at providing a better understanding of our infrastructure to any ART new starter as well as to anyone that is interesting in putting together a similar infrastructure for any other software project.
Goal of the session is to show ways of identifying badly written code in long term perspective. As an example a OSS e-commerce platform was examined and the results will be discussed during session. I will also show waht we, as developers, should pay attention while doing out daily programming routines. Both programmers and other team members will be able to identify committed code crimes :)
How to build processes that enable processing both big and small amounts of data by people with different backgrouns, using the same tools - Jupyter and Spark
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...StampedeCon
This session will be a detailed recount of the design, implementation, and launch of the next-generation Shutterstock Data Platform, with strong emphasis on conveying clear, understandable learnings that can be transferred to your own organizations and projects. This platform was architected around the prevailing use of Kafka as a highly-scalable central data hub for shipping data across your organization in batch or streaming fashion. It also relies heavily on Avro as a serialization format and a global schema registry to provide structure that greatly improves quality and usability of our data sets, while also allowing the flexibility to evolve schemas and maintain backwards compatibility.
As a company, Shutterstock has always focused heavily on leveraging open source technologies in developing its products and infrastructure, and open source has been a driving force in big data more so than almost any other software sub-sector. With this plethora of constantly evolving data technologies, it can be a daunting task to select the right tool for your problem. We will discuss our approach for choosing specific existing technologies and when we made decisions to invest time in home-grown components and solutions.
We will cover advantages and the engineering process of developing language-agnostic APIs for publishing to and consuming from the data platform. These APIs can power some very interesting streaming analytics solutions that are easily accessible to teams across our engineering organization.
We will also discuss some of the massive advantages a global schema for your data provides for downstream ETL and data analytics. ETL into Hadoop and creation and maintenance of Hive databases and tables becomes much more reliable and easily automated with historically compatible schemas. To complement this schema-based approach, we will cover results of performance testing various file formats and compression schemes in Hadoop and Hive, the massive performance benefits you can gain in analytical workloads by leveraging highly optimized columnar file formats such as ORC and Parquet, and how you can use good old fashioned Hive as a tool for easily and efficiently converting exiting datasets into these formats.
Finally, we will cover lessons learned in launching this platform across our organization, future improvements and further design, and the need for data engineers to understand and speak the languages of data scientists and web, infrastructure, and network engineers.
WebCamp 2016: Python. Вячеслав Каковский: Real-time мессенджер на Python. Осо...WebCamp
Доклад построен на опыте разработки платформы реал-тайм мессенджера с характеристиками:
* 100 000+ одновременно подключенных пользователей
* 100+ серверов
* REST API для ботов
Структура доклада:
* Зачем разрабатывать мессенджер?
* Актуальные протоколы обмена сообщениями
* Архитектурные подходы к разработке мессенджера
* Библиотеки и инструменты
* Проблемы и подводные камни
The world has changed and having one huge server won’t do the job anymore, when you’re talking about vast amounts of data, growing all the time the ability to Scale Out would be your savior. Apache Spark is a fast and general engine for big data processing, with built-in modules for streaming, SQL, machine learning and graph processing.
This lecture will be about the basics of Apache Spark and distributed computing and the development tools needed to have a functional environment.
Spring Data Neo4j: Graph Power Your Enterprise AppsGraphAware
A few weeks ago Spring Data Neo4j version 5 was released as part of the Spring Data 2.0 release train. Time to present the Spring way to work with Neo4j and introduce the latest features SDN 5 and its supporting library Neo4j-OGM 3 provide. The talk will also give an overview of the overall architecture and shows examples how to build modern, compact back-ends and web-applications using Spring Data Neo4j. Of course we will give a glance of what the future will bring to Spring Data Neo4j.
WebCamp Ukraine 2016: Instant messenger with Python. Back-end developmentViach Kakovskyi
Lessons learnt by the development of Instant Messenger with Python 2 and Twisted.
About the project:
* 100 000+ connected users
* 100+ nodes
* REST API for integration
Expected audience: developers who are new to development of Instant Messenger or folks who develop a system from scratch.
All the stuff based on own production experience.
Did you miss Scala Days 2015 in San Francisco? Have no fear! BoldRadius was there and we've compiled the best of the best! Here are the highlights of a great conference.
A summary of DBpedia's History and a detailed analysis of challenges and solutions.
We show how the Linked Data Cloud evolved around DBpedia and also what problems we and other data projects encountered. We included a section on the new solutions that will lead DBpedia into a bright future.
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...Scintica Instrumentation
Intravital microscopy (IVM) is a powerful tool utilized to study cellular behavior over time and space in vivo. Much of our understanding of cell biology has been accomplished using various in vitro and ex vivo methods; however, these studies do not necessarily reflect the natural dynamics of biological processes. Unlike traditional cell culture or fixed tissue imaging, IVM allows for the ultra-fast high-resolution imaging of cellular processes over time and space and were studied in its natural environment. Real-time visualization of biological processes in the context of an intact organism helps maintain physiological relevance and provide insights into the progression of disease, response to treatments or developmental processes.
In this webinar we give an overview of advanced applications of the IVM system in preclinical research. IVIM technology is a provider of all-in-one intravital microscopy systems and solutions optimized for in vivo imaging of live animal models at sub-micron resolution. The system’s unique features and user-friendly software enables researchers to probe fast dynamic biological processes such as immune cell tracking, cell-cell interaction as well as vascularization and tumor metastasis with exceptional detail. This webinar will also give an overview of IVM being utilized in drug development, offering a view into the intricate interaction between drugs/nanoparticles and tissues in vivo and allows for the evaluation of therapeutic intervention in a variety of tissues and organs. This interdisciplinary collaboration continues to drive the advancements of novel therapeutic strategies.
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Ana Luísa Pinho
Functional Magnetic Resonance Imaging (fMRI) provides means to characterize brain activations in response to behavior. However, cognitive neuroscience has been limited to group-level effects referring to the performance of specific tasks. To obtain the functional profile of elementary cognitive mechanisms, the combination of brain responses to many tasks is required. Yet, to date, both structural atlases and parcellation-based activations do not fully account for cognitive function and still present several limitations. Further, they do not adapt overall to individual characteristics. In this talk, I will give an account of deep-behavioral phenotyping strategies, namely data-driven methods in large task-fMRI datasets, to optimize functional brain-data collection and improve inference of effects-of-interest related to mental processes. Key to this approach is the employment of fast multi-functional paradigms rich on features that can be well parametrized and, consequently, facilitate the creation of psycho-physiological constructs to be modelled with imaging data. Particular emphasis will be given to music stimuli when studying high-order cognitive mechanisms, due to their ecological nature and quality to enable complex behavior compounded by discrete entities. I will also discuss how deep-behavioral phenotyping and individualized models applied to neuroimaging data can better account for the subject-specific organization of domain-general cognitive systems in the human brain. Finally, the accumulation of functional brain signatures brings the possibility to clarify relationships among tasks and create a univocal link between brain systems and mental functions through: (1) the development of ontologies proposing an organization of cognitive processes; and (2) brain-network taxonomies describing functional specialization. To this end, tools to improve commensurability in cognitive science are necessary, such as public repositories, ontology-based platforms and automated meta-analysis tools. I will thus discuss some brain-atlasing resources currently under development, and their applicability in cognitive as well as clinical neuroscience.
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...University of Maribor
Slides from:
11th International Conference on Electrical, Electronics and Computer Engineering (IcETRAN), Niš, 3-6 June 2024
Track: Artificial Intelligence
https://www.etran.rs/2024/en/home-english/
Richard's aventures in two entangled wonderlandsRichard Gill
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
Professional air quality monitoring systems provide immediate, on-site data for analysis, compliance, and decision-making.
Monitor common gases, weather parameters, particulates.
Cancer cell metabolism: special Reference to Lactate PathwayAADYARAJPANDEY1
Normal Cell Metabolism:
Cellular respiration describes the series of steps that cells use to break down sugar and other chemicals to get the energy we need to function.
Energy is stored in the bonds of glucose and when glucose is broken down, much of that energy is released.
Cell utilize energy in the form of ATP.
The first step of respiration is called glycolysis. In a series of steps, glycolysis breaks glucose into two smaller molecules - a chemical called pyruvate. A small amount of ATP is formed during this process.
Most healthy cells continue the breakdown in a second process, called the Kreb's cycle. The Kreb's cycle allows cells to “burn” the pyruvates made in glycolysis to get more ATP.
The last step in the breakdown of glucose is called oxidative phosphorylation (Ox-Phos).
It takes place in specialized cell structures called mitochondria. This process produces a large amount of ATP. Importantly, cells need oxygen to complete oxidative phosphorylation.
If a cell completes only glycolysis, only 2 molecules of ATP are made per glucose. However, if the cell completes the entire respiration process (glycolysis - Kreb's - oxidative phosphorylation), about 36 molecules of ATP are created, giving it much more energy to use.
IN CANCER CELL:
Unlike healthy cells that "burn" the entire molecule of sugar to capture a large amount of energy as ATP, cancer cells are wasteful.
Cancer cells only partially break down sugar molecules. They overuse the first step of respiration, glycolysis. They frequently do not complete the second step, oxidative phosphorylation.
This results in only 2 molecules of ATP per each glucose molecule instead of the 36 or so ATPs healthy cells gain. As a result, cancer cells need to use a lot more sugar molecules to get enough energy to survive.
Unlike healthy cells that "burn" the entire molecule of sugar to capture a large amount of energy as ATP, cancer cells are wasteful.
Cancer cells only partially break down sugar molecules. They overuse the first step of respiration, glycolysis. They frequently do not complete the second step, oxidative phosphorylation.
This results in only 2 molecules of ATP per each glucose molecule instead of the 36 or so ATPs healthy cells gain. As a result, cancer cells need to use a lot more sugar molecules to get enough energy to survive.
introduction to WARBERG PHENOMENA:
WARBURG EFFECT Usually, cancer cells are highly glycolytic (glucose addiction) and take up more glucose than do normal cells from outside.
Otto Heinrich Warburg (; 8 October 1883 – 1 August 1970) In 1931 was awarded the Nobel Prize in Physiology for his "discovery of the nature and mode of action of the respiratory enzyme.
WARNBURG EFFECT : cancer cells under aerobic (well-oxygenated) conditions to metabolize glucose to lactate (aerobic glycolysis) is known as the Warburg effect. Warburg made the observation that tumor slices consume glucose and secrete lactate at a higher rate than normal tissues.
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Sérgio Sacani
Since volcanic activity was first discovered on Io from Voyager images in 1979, changes
on Io’s surface have been monitored from both spacecraft and ground-based telescopes.
Here, we present the highest spatial resolution images of Io ever obtained from a groundbased telescope. These images, acquired by the SHARK-VIS instrument on the Large
Binocular Telescope, show evidence of a major resurfacing event on Io’s trailing hemisphere. When compared to the most recent spacecraft images, the SHARK-VIS images
show that a plume deposit from a powerful eruption at Pillan Patera has covered part
of the long-lived Pele plume deposit. Although this type of resurfacing event may be common on Io, few have been detected due to the rarity of spacecraft visits and the previously low spatial resolution available from Earth-based telescopes. The SHARK-VIS instrument ushers in a new era of high resolution imaging of Io’s surface using adaptive
optics at visible wavelengths.
Nutraceutical market, scope and growth: Herbal drug technologyLokesh Patil
As consumer awareness of health and wellness rises, the nutraceutical market—which includes goods like functional meals, drinks, and dietary supplements that provide health advantages beyond basic nutrition—is growing significantly. As healthcare expenses rise, the population ages, and people want natural and preventative health solutions more and more, this industry is increasing quickly. Further driving market expansion are product formulation innovations and the use of cutting-edge technology for customized nutrition. With its worldwide reach, the nutraceutical industry is expected to keep growing and provide significant chances for research and investment in a number of categories, including vitamins, minerals, probiotics, and herbal supplements.
2. What Is Open Chemistry?
● Umbrella of related projects to coordinate and group
○ Focus on 3-clause BSD permissively licensed projects
○ Aims for more complete solution
● Initially three related projects
○ Avogadro 2 - editor, visualization, interaction with small number of molecules
○ MoleQueue - running computational jobs, abstracting local and remote execution
○ MongoChem - database for interacting with many molecules, summarizing data, informatics
● Evolved over the years but still retains many of those goals
○ GitHub organization with 35 repositories at the last count
● Umbrella organization in Google Summer of Code
○ Four years, with 3, 7, 7, and TBD students over a broad range of projects
○ Hope to continue this and other community engagement activities
https://openchemistry.org/
3. Why Jupyter?
● Supports interactive analysis while preserving the analytic steps
○ Preserves much of the provenance
● Familiar environment and language
○ Many are already familiar with the environment
○ Python is the language of scientific computing
● Simple extension mechanism
○ Particularly with JupyterLab
○ Allows for complex domain specific visualization
● Vibrant ecosystem and community
4. Open Chemistry, Avogadro, Jupyter and Web
● Making data more accessible
● Federated, open data repositories
● Modern HTML5 interfaces
● JSON data format for NWChem data as a prototype, add to other QM codes
● What about working with the data?
● Can we have chemistry from desktop-to-phone
○ Create data, upload, organize
○ Search and analyze data
○ Share data - email, social media, publications
● What if we tied a data server to a Jupyter notebook?
● Can we make data a first class citizen in modern workflows?
5.
6.
7. Increased Reusability
● Benefit from a huge number of open source packages/projects
● Quantum chemistry codes
○ NWChem, Psi4, ...
● Open source libraries/utilities
○ Avogadro, Open Babel, cclib, RDKit, ...
● Visualization, charting, etc
○ vtk.js, 3DMol.js, D3, plotly, matplotlib, ...
● Web frameworks
○ React, stencil.js, npm, ...
● Languages
○ C++, Python, JavaScript, TypeScript, ...
● Containers
○ Docker, singularity, shifter, ...
Also version control such as git,
continuous integration such as CircleCI,
build systems such as CMake, project
hosting such as GitHub, hardware
accelerated rendering such as WebGL,
queuing systems like grid engine,
semantic data stores like Jena, format
standards such as JSON,
MessagePack, HDF5, XML, HTTP,
RESTful web service standards, servers
such as nginx, CherryPy, Flask, and
many other components that are used
directly or gave useful input
8. Increased Reusability
● Developed on GitHub under permissive OSI-approved licenses
○ Industry standard 3-clause BSD and Apache 2 mainly
● Web widgets using stencil.js to offer web tags
● Binary wheels for Python wrapped Avogadro core
○ pip install avogadro
● Pip installable Python modules for standard functions
○ pip install openchemistry
● JupyterLab extensions that can be installed locally
● Binder for “live” notebooks hosted in cloud containers
● Quantum codes and machine learning models in Docker containers
● Establishing data standards for reliable data exchange
9. Approach and Philosophy
● Data is the core of the platform
○ Start with a simple but powerful date model and data server
● RESTful APIs are ubiquitous
○ Use from notebooks, apps, command line, desktop, etc
● Jupyter notebooks for interactive analysis
○ High level domain specific Python API within the notebooks
● Web application
○ Authentication, access control, management tasks
○ Launching, searching, managing notebooks
○ Interact with data outside of the notebook
23. Reproducibility for Chemical-Physics Data
● Dream - share results like we can currently share code
● Links to interactive pages displaying data
● Those pages link to workflows/Jupyter notebooks
● From input geometry/molecule through to final figure
● Docker containers offer known, reproducible binary
○ Metadata has input parameters, container ID, etc
● Aid reproducibility, machine learning, and education
● Federate access, offer full worked examples - editable!
24. Docker Containers for Chemical-Physics
● Developed three containers so far to serve the platform
○ NWChem and Psi4 for computational chemistry
○ ChemML for machine learning
● These containers are self-contained workflow tools
○ Take JSON and input geometry
○ Use a Python-based execution script
○ Output JSON and optionally all output logs/data
● Run using Docker, Singularity, soon Shifter on AWS, locally, NERSC
● Simple contract making it easy to add more codes to the platform
○ Take some standard input, translate for your code, translate to standard output
○ Get workflow management, integration with Jupyter, visualization, ...
● The Dockerfile has build instructions, DockerHub hosts images
26. Running a Psi4 Docker Container
● Can be run independently of the framework
● docker run -v $(pwd):/data openchemistry/psi4:latest
○ -g /data/geometry.xyz
○ -p /data/parameters.json
○ -o /data/out.cjson
○ -s /data/scratch
● Runs a Python driver script that interprets switches
● Perform input/output translation, input generation, etc
● Packages a code for use in a larger workflow
27. Running a NWChem Docker Container
● Can be run independently of the framework
● docker run -v $(pwd):/data openchemistry/nwchem:latest
○ -g /data/geometry.xyz
○ -p /data/parameters.json
○ -o /data/out.cjson
○ -s /data/scratch
● Runs a Python driver script that interprets switches
● Perform input/output translation, input generation, etc
● Packages a code for use in a larger workflow
28. Export to Binder
● Goes beyond simply showing the static notebook
● Specific GitHub repository layout
○ Install custom Python modules
○ Install JupyterLab extensions
● Service builds a container on the fly
● Can click on a link and run the example container
http://mybinder.org/v2/gh/openchemistry/jupyter-examples/master?urlpath=lab/tree/caffeine.ipynb
30. Machine Learning
● What happens after your model is trained and published?
● Can we treat machine learning models like other codes making predictions?
● Lots of new moving parts that need to managed
○ The actual machine learning code, possible accelerator access, etc
○ The trained model, loading it, executing it reproducibly
○ Generation of relevant descriptors as part of the input
○ Extracting output, storing, displaying, and visualizing data
● Starts to share a number of commonalities with other simulations
● Important differences too
○ Narrower focus for most models
○ Possibility to augment trained models, create derived models
32. Data Mining
● When running calculations all data, metadata, workflows are captured
● Creation of a structured data store with a friendly frontend
● Possible to perform queries and perform analytics on the data generated
● Machine learning can feed off of this data
○ Reuse the same infrastructure to initiate and generate new data
○ Comparison of predicted data to computational codes, experimental data
○ Use of a familiar JupyterLab interface
● Augmenting the notebook with a data server that can access compute
○ Notebook acts as initiator for large jobs
○ Returning to the notebook later to check on progress
● Independent RESTful APIs, web frontend, batch export of data
33. Chemical JSON
● Developed to support projects (~2011)
● Stores structure, geometry, identifiers,
descriptors, other useful data
● Benefits:
○ More compact than XML/CML
○ Native to MongoDB, JSON-RPC, REST
○ Easily converted to binary representation
● Now features basis sets, MOs, sets
● MessagePack a good option for binary
● Maps easily to HDF5 binary data store
● MolSSI JSON schema collaboration
34. Papers and a Little History on Chemical JSON
● Quixote collaboration with Peter Murray-Rust (2011)
○ “The Quixote project: Collaborative and Open Quantum Chemistry data management in the
Internet age”, https://doi.org/10.1186/1758-2946-3-38
● Early work in CML with NWChem and Avogadro (2013)
○ “From data to analysis: linking NWChem and Avogadro with the syntax and semantics of
Chemical Markup Language” https://doi.org/10.1186/1758-2946-5-25
● Later moved to JSON, RESTful API, visualization (2017)
○ “Open chemistry: RESTful web APIs, JSON, NWChem and the modern web application”
○ https://doi.org/10.1186/s13321-017-0241-z
● Interested in Linked Data, JSON-LD, and how they might be layered on top
● Use of BSON, HDF5, and related technologies for binary data
● BSD licensed reference implementations
35. Pillars of Phase II SBIR Project
1. Data and metadata
○ JSON, JSON-LD, HDF5 and semantic web
2. Server platform
○ RESTful APIs, computational chemistry, data, machine learning, HPC/cloud, and triple store
3. Jupyter integration
○ Computational chemistry, data, machine learning, query, analytics, and data visualization
4. Web application
○ Management interfaces, single-page interface, notebook/data browser, and search
5. Avogadro and local Python
○ Python shell integration, extension of Avogadro to use server interface, editing data on server
Regular automated software deployments, releases with Docker containers
36. Closing Thoughts
● Nearly halfway through the Phase II project
● Data and software are both central and core to the platform
● Highly reusable through licensing, modular nature, data standards, containers
● Augmented by abstracted access to compute resources
● Open source, developing entry points for customization and extension
● Building on best-of-breed open source community projects
● Extending to better support the chemistry community
○ Just at the start of making machine learning and data mining first class citizens
● User friendly interfaces, Python at the core, visualization, data analytics
● SBIR funding from DOE Office of Science contract DE-SC0017193
○ Collaborating with Bert de Jong at Berkeley Lab and Johannes Hachmann at SUNY Buffalo