SlideShare a Scribd company logo
Reproducible Quantum Chemistry
Dr. Marcus D. Hanwell
@mhanwell
Technical Leader
American Chemical Society
Orlando, FL
31 March, 2019
What Is Open Chemistry?
● Umbrella of related projects to coordinate and group
○ Focus on 3-clause BSD permissively licensed projects
○ Aims for more complete solution
● Initially three related projects
○ Avogadro 2 - editor, visualization, interaction with small number of molecules
○ MoleQueue - running computational jobs, abstracting local and remote execution
○ MongoChem - database for interacting with many molecules, summarizing data, informatics
● Evolved over the years but still retains many of those goals
○ GitHub organization with 35 repositories at the last count
● Umbrella organization in Google Summer of Code
○ Four years, with 3, 7, 7, and TBD students over a broad range of projects
○ Hope to continue this and other community engagement activities
https://openchemistry.org/
Why Jupyter?
● Supports interactive analysis while preserving the analytic steps​
○ Preserves much of the provenance​
● Familiar environment and language​
○ Many are already familiar with the environment​
○ Python is the language of scientific computing​
● Simple extension mechanism​
○ Particularly with JupyterLab​
○ Allows for complex domain specific visualization​
● Vibrant ecosystem and community​
​
Open Chemistry, Avogadro, Jupyter and Web
● Making data more accessible
● Federated, open data repositories
● Modern HTML5 interfaces
● JSON data format for NWChem data as a prototype, add to other QM codes
● What about working with the data?
● Can we have chemistry from desktop-to-phone
○ Create data, upload, organize
○ Search and analyze data
○ Share data - email, social media, publications
● What if we tied a data server to a Jupyter notebook?
● Can we make data a first class citizen in modern workflows?
Increased Reusability
● Benefit from a huge number of open source packages/projects
● Quantum chemistry codes
○ NWChem, Psi4, ...
● Open source libraries/utilities
○ Avogadro, Open Babel, cclib, RDKit, ...
● Visualization, charting, etc
○ vtk.js, 3DMol.js, D3, plotly, matplotlib, ...
● Web frameworks
○ React, stencil.js, npm, ...
● Languages
○ C++, Python, JavaScript, TypeScript, ...
● Containers
○ Docker, singularity, shifter, ...
Also version control such as git,
continuous integration such as CircleCI,
build systems such as CMake, project
hosting such as GitHub, hardware
accelerated rendering such as WebGL,
queuing systems like grid engine,
semantic data stores like Jena, format
standards such as JSON,
MessagePack, HDF5, XML, HTTP,
RESTful web service standards, servers
such as nginx, CherryPy, Flask, and
many other components that are used
directly or gave useful input
Increased Reusability
● Developed on GitHub under permissive OSI-approved licenses
○ Industry standard 3-clause BSD and Apache 2 mainly
● Web widgets using stencil.js to offer web tags
● Binary wheels for Python wrapped Avogadro core
○ pip install avogadro
● Pip installable Python modules for standard functions
○ pip install openchemistry
● JupyterLab extensions that can be installed locally
● Binder for “live” notebooks hosted in cloud containers
● Quantum codes and machine learning models in Docker containers
● Establishing data standards for reliable data exchange
Approach and Philosophy
● Data is the core of the platform
○ Start with a simple but powerful date model and data server
● RESTful APIs are ubiquitous
○ Use from notebooks, apps, command line, desktop, etc
● Jupyter notebooks for interactive analysis
○ High level domain specific Python API within the notebooks
● Web application
○ Authentication, access control, management tasks
○ Launching, searching, managing notebooks
○ Interact with data outside of the notebook
Reusable Web Visualization Widgets
Data, Python, Jupyter, Chemistry
Responsive Design
Getting the Platform
Containers and the Swarm
Reproducibility for Chemical-Physics Data
● Dream - share results like we can currently share code
● Links to interactive pages displaying data
● Those pages link to workflows/Jupyter notebooks
● From input geometry/molecule through to final figure
● Docker containers offer known, reproducible binary
○ Metadata has input parameters, container ID, etc
● Aid reproducibility, machine learning, and education
● Federate access, offer full worked examples - editable!
Docker Containers for Chemical-Physics
● Developed three containers so far to serve the platform
○ NWChem and Psi4 for computational chemistry
○ ChemML for machine learning
● These containers are self-contained workflow tools
○ Take JSON and input geometry
○ Use a Python-based execution script
○ Output JSON and optionally all output logs/data
● Run using Docker, Singularity, soon Shifter on AWS, locally, NERSC
● Simple contract making it easy to add more codes to the platform
○ Take some standard input, translate for your code, translate to standard output
○ Get workflow management, integration with Jupyter, visualization, ...
● The Dockerfile has build instructions, DockerHub hosts images
Psi4 Dockerfile
Running a Psi4 Docker Container
● Can be run independently of the framework
● docker run -v $(pwd):/data openchemistry/psi4:latest
○ -g /data/geometry.xyz
○ -p /data/parameters.json
○ -o /data/out.cjson
○ -s /data/scratch
● Runs a Python driver script that interprets switches
● Perform input/output translation, input generation, etc
● Packages a code for use in a larger workflow
Running a NWChem Docker Container
● Can be run independently of the framework
● docker run -v $(pwd):/data openchemistry/nwchem:latest
○ -g /data/geometry.xyz
○ -p /data/parameters.json
○ -o /data/out.cjson
○ -s /data/scratch
● Runs a Python driver script that interprets switches
● Perform input/output translation, input generation, etc
● Packages a code for use in a larger workflow
Export to Binder
● Goes beyond simply showing the static notebook
● Specific GitHub repository layout
○ Install custom Python modules
○ Install JupyterLab extensions
● Service builds a container on the fly
● Can click on a link and run the example container
http://mybinder.org/v2/gh/openchemistry/jupyter-examples/master?urlpath=lab/tree/caffeine.ipynb
Export to Binder
Machine Learning
● What happens after your model is trained and published?
● Can we treat machine learning models like other codes making predictions?
● Lots of new moving parts that need to managed
○ The actual machine learning code, possible accelerator access, etc
○ The trained model, loading it, executing it reproducibly
○ Generation of relevant descriptors as part of the input
○ Extracting output, storing, displaying, and visualizing data
● Starts to share a number of commonalities with other simulations
● Important differences too
○ Narrower focus for most models
○ Possibility to augment trained models, create derived models
Running ChemML in a Jupyter Notebook
Data Mining
● When running calculations all data, metadata, workflows are captured
● Creation of a structured data store with a friendly frontend
● Possible to perform queries and perform analytics on the data generated
● Machine learning can feed off of this data
○ Reuse the same infrastructure to initiate and generate new data
○ Comparison of predicted data to computational codes, experimental data
○ Use of a familiar JupyterLab interface
● Augmenting the notebook with a data server that can access compute
○ Notebook acts as initiator for large jobs
○ Returning to the notebook later to check on progress
● Independent RESTful APIs, web frontend, batch export of data
Chemical JSON
● Developed to support projects (~2011)
● Stores structure, geometry, identifiers,
descriptors, other useful data
● Benefits:
○ More compact than XML/CML
○ Native to MongoDB, JSON-RPC, REST
○ Easily converted to binary representation
● Now features basis sets, MOs, sets
● MessagePack a good option for binary
● Maps easily to HDF5 binary data store
● MolSSI JSON schema collaboration
Papers and a Little History on Chemical JSON
● Quixote collaboration with Peter Murray-Rust (2011)
○ “The Quixote project: Collaborative and Open Quantum Chemistry data management in the
Internet age”, https://doi.org/10.1186/1758-2946-3-38
● Early work in CML with NWChem and Avogadro (2013)
○ “From data to analysis: linking NWChem and Avogadro with the syntax and semantics of
Chemical Markup Language” https://doi.org/10.1186/1758-2946-5-25
● Later moved to JSON, RESTful API, visualization (2017)
○ “Open chemistry: RESTful web APIs, JSON, NWChem and the modern web application”
○ https://doi.org/10.1186/s13321-017-0241-z
● Interested in Linked Data, JSON-LD, and how they might be layered on top
● Use of BSON, HDF5, and related technologies for binary data
● BSD licensed reference implementations
Pillars of Phase II SBIR Project
1. Data and metadata
○ JSON, JSON-LD, HDF5 and semantic web
2. Server platform
○ RESTful APIs, computational chemistry, data, machine learning, HPC/cloud, and triple store
3. Jupyter integration
○ Computational chemistry, data, machine learning, query, analytics, and data visualization
4. Web application
○ Management interfaces, single-page interface, notebook/data browser, and search
5. Avogadro and local Python
○ Python shell integration, extension of Avogadro to use server interface, editing data on server
Regular automated software deployments, releases with Docker containers
Closing Thoughts
● Nearly halfway through the Phase II project
● Data and software are both central and core to the platform
● Highly reusable through licensing, modular nature, data standards, containers
● Augmented by abstracted access to compute resources
● Open source, developing entry points for customization and extension
● Building on best-of-breed open source community projects
● Extending to better support the chemistry community
○ Just at the start of making machine learning and data mining first class citizens
● User friendly interfaces, Python at the core, visualization, data analytics
● SBIR funding from DOE Office of Science contract DE-SC0017193
○ Collaborating with Bert de Jong at Berkeley Lab and Johannes Hachmann at SUNY Buffalo

More Related Content

What's hot

Go at uber
Go at uberGo at uber
Go at uber
Rob Skillington
 
24 uses for perl6
24 uses for perl624 uses for perl6
24 uses for perl6
Simon Proctor
 
PyCon Poland 2016: Maintaining a high load Python project: typical mistakes
PyCon Poland 2016: Maintaining a high load Python project: typical mistakesPyCon Poland 2016: Maintaining a high load Python project: typical mistakes
PyCon Poland 2016: Maintaining a high load Python project: typical mistakes
Viach Kakovskyi
 
Avogadro: Open Source Libraries and Application for Computational Chemistry
Avogadro: Open Source Libraries and Application for Computational ChemistryAvogadro: Open Source Libraries and Application for Computational Chemistry
Avogadro: Open Source Libraries and Application for Computational Chemistry
Marcus Hanwell
 
Ceph Day Chicago: Using Ceph for Large Hadron Collider Data
Ceph Day Chicago: Using Ceph for Large Hadron Collider Data Ceph Day Chicago: Using Ceph for Large Hadron Collider Data
Ceph Day Chicago: Using Ceph for Large Hadron Collider Data
Ceph Community
 
Bsdtw17: george neville neil: realities of dtrace on free-bsd
Bsdtw17: george neville neil: realities of dtrace on free-bsdBsdtw17: george neville neil: realities of dtrace on free-bsd
Bsdtw17: george neville neil: realities of dtrace on free-bsd
Scott Tsai
 
Austin Python Meetup 2017: How to Stop Worrying and Start a Project with Pyth...
Austin Python Meetup 2017: How to Stop Worrying and Start a Project with Pyth...Austin Python Meetup 2017: How to Stop Worrying and Start a Project with Pyth...
Austin Python Meetup 2017: How to Stop Worrying and Start a Project with Pyth...
Viach Kakovskyi
 
Ergo platform's approach
Ergo platform's approachErgo platform's approach
Ergo platform's approach
Dmitry Meshkov
 
PrefetchML: a Framework for Prefetching and Caching Models
PrefetchML: a Framework for Prefetching and Caching ModelsPrefetchML: a Framework for Prefetching and Caching Models
PrefetchML: a Framework for Prefetching and Caching Models
Gwendal Daniel
 
BKK16-306 ART ii
BKK16-306 ART iiBKK16-306 ART ii
BKK16-306 ART ii
Linaro
 
Code Crime Scene pawel klimczyk
Code Crime Scene   pawel klimczykCode Crime Scene   pawel klimczyk
Code Crime Scene pawel klimczyk
Pawel Klimczyk
 

What's hot (11)

Go at uber
Go at uberGo at uber
Go at uber
 
24 uses for perl6
24 uses for perl624 uses for perl6
24 uses for perl6
 
PyCon Poland 2016: Maintaining a high load Python project: typical mistakes
PyCon Poland 2016: Maintaining a high load Python project: typical mistakesPyCon Poland 2016: Maintaining a high load Python project: typical mistakes
PyCon Poland 2016: Maintaining a high load Python project: typical mistakes
 
Avogadro: Open Source Libraries and Application for Computational Chemistry
Avogadro: Open Source Libraries and Application for Computational ChemistryAvogadro: Open Source Libraries and Application for Computational Chemistry
Avogadro: Open Source Libraries and Application for Computational Chemistry
 
Ceph Day Chicago: Using Ceph for Large Hadron Collider Data
Ceph Day Chicago: Using Ceph for Large Hadron Collider Data Ceph Day Chicago: Using Ceph for Large Hadron Collider Data
Ceph Day Chicago: Using Ceph for Large Hadron Collider Data
 
Bsdtw17: george neville neil: realities of dtrace on free-bsd
Bsdtw17: george neville neil: realities of dtrace on free-bsdBsdtw17: george neville neil: realities of dtrace on free-bsd
Bsdtw17: george neville neil: realities of dtrace on free-bsd
 
Austin Python Meetup 2017: How to Stop Worrying and Start a Project with Pyth...
Austin Python Meetup 2017: How to Stop Worrying and Start a Project with Pyth...Austin Python Meetup 2017: How to Stop Worrying and Start a Project with Pyth...
Austin Python Meetup 2017: How to Stop Worrying and Start a Project with Pyth...
 
Ergo platform's approach
Ergo platform's approachErgo platform's approach
Ergo platform's approach
 
PrefetchML: a Framework for Prefetching and Caching Models
PrefetchML: a Framework for Prefetching and Caching ModelsPrefetchML: a Framework for Prefetching and Caching Models
PrefetchML: a Framework for Prefetching and Caching Models
 
BKK16-306 ART ii
BKK16-306 ART iiBKK16-306 ART ii
BKK16-306 ART ii
 
Code Crime Scene pawel klimczyk
Code Crime Scene   pawel klimczykCode Crime Scene   pawel klimczyk
Code Crime Scene pawel klimczyk
 

Similar to Open Chemistry, JupyterLab and data: Reproducible quantum chemistry

Data analysis with Pandas and Spark
Data analysis with Pandas and SparkData analysis with Pandas and Spark
Data analysis with Pandas and Spark
Felix Crisan
 
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
StampedeCon
 
WebCamp 2016: Python. Вячеслав Каковский: Real-time мессенджер на Python. Осо...
WebCamp 2016: Python. Вячеслав Каковский: Real-time мессенджер на Python. Осо...WebCamp 2016: Python. Вячеслав Каковский: Real-time мессенджер на Python. Осо...
WebCamp 2016: Python. Вячеслав Каковский: Real-time мессенджер на Python. Осо...
WebCamp
 
Understanding Hadoop
Understanding HadoopUnderstanding Hadoop
Understanding Hadoop
Ahmed Ossama
 
Apache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-AriApache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-Ari
Demi Ben-Ari
 
Spring Data Neo4j: Graph Power Your Enterprise Apps
Spring Data Neo4j: Graph Power Your Enterprise AppsSpring Data Neo4j: Graph Power Your Enterprise Apps
Spring Data Neo4j: Graph Power Your Enterprise Apps
GraphAware
 
Data Science as Scale
Data Science as ScaleData Science as Scale
Data Science as Scale
Conor B. Murphy
 
Python workshop
Python workshopPython workshop
Python workshop
Marie Behzadi
 
Python workshop
Python workshopPython workshop
Python workshop
Shiraz LUG
 
WebCamp Ukraine 2016: Instant messenger with Python. Back-end development
WebCamp Ukraine 2016: Instant messenger with Python. Back-end developmentWebCamp Ukraine 2016: Instant messenger with Python. Back-end development
WebCamp Ukraine 2016: Instant messenger with Python. Back-end development
Viach Kakovskyi
 
A Modern Interface for Data Science on Postgres/Greenplum - Greenplum Summit ...
A Modern Interface for Data Science on Postgres/Greenplum - Greenplum Summit ...A Modern Interface for Data Science on Postgres/Greenplum - Greenplum Summit ...
A Modern Interface for Data Science on Postgres/Greenplum - Greenplum Summit ...
VMware Tanzu
 
Sebastian Hellmann
Sebastian HellmannSebastian Hellmann
Sebastian Hellmann
Connected Data World
 
Scala Days Highlights | BoldRadius
Scala Days Highlights | BoldRadiusScala Days Highlights | BoldRadius
Scala Days Highlights | BoldRadius
BoldRadius Solutions
 
Nikhil summer internship 2016
Nikhil   summer internship 2016Nikhil   summer internship 2016
Nikhil summer internship 2016
Nikhil Shekhar
 
Red hat infrastructure for analytics
Red hat infrastructure for analyticsRed hat infrastructure for analytics
Red hat infrastructure for analytics
Kyle Bader
 
The Semantic Web and Drupal 7 - Loja 2013
The Semantic Web and Drupal 7 - Loja 2013The Semantic Web and Drupal 7 - Loja 2013
The Semantic Web and Drupal 7 - Loja 2013
scorlosquet
 
KEDL DBpedia 2019
KEDL DBpedia  2019KEDL DBpedia  2019
KEDL DBpedia 2019
Sebastian Hellmann
 
Big data @ Hootsuite analtyics
Big data @ Hootsuite analtyicsBig data @ Hootsuite analtyics
Big data @ Hootsuite analtyics
Claudiu Coman
 
Python in Industry
Python in IndustryPython in Industry
Python in Industry
Dharmit Shah
 
BISSA: Empowering Web gadget Communication with Tuple Spaces
BISSA: Empowering Web gadget Communication with Tuple SpacesBISSA: Empowering Web gadget Communication with Tuple Spaces
BISSA: Empowering Web gadget Communication with Tuple SpacesSrinath Perera
 

Similar to Open Chemistry, JupyterLab and data: Reproducible quantum chemistry (20)

Data analysis with Pandas and Spark
Data analysis with Pandas and SparkData analysis with Pandas and Spark
Data analysis with Pandas and Spark
 
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
Building a Next-gen Data Platform and Leveraging the OSS Ecosystem for Easy W...
 
WebCamp 2016: Python. Вячеслав Каковский: Real-time мессенджер на Python. Осо...
WebCamp 2016: Python. Вячеслав Каковский: Real-time мессенджер на Python. Осо...WebCamp 2016: Python. Вячеслав Каковский: Real-time мессенджер на Python. Осо...
WebCamp 2016: Python. Вячеслав Каковский: Real-time мессенджер на Python. Осо...
 
Understanding Hadoop
Understanding HadoopUnderstanding Hadoop
Understanding Hadoop
 
Apache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-AriApache Spark 101 - Demi Ben-Ari
Apache Spark 101 - Demi Ben-Ari
 
Spring Data Neo4j: Graph Power Your Enterprise Apps
Spring Data Neo4j: Graph Power Your Enterprise AppsSpring Data Neo4j: Graph Power Your Enterprise Apps
Spring Data Neo4j: Graph Power Your Enterprise Apps
 
Data Science as Scale
Data Science as ScaleData Science as Scale
Data Science as Scale
 
Python workshop
Python workshopPython workshop
Python workshop
 
Python workshop
Python workshopPython workshop
Python workshop
 
WebCamp Ukraine 2016: Instant messenger with Python. Back-end development
WebCamp Ukraine 2016: Instant messenger with Python. Back-end developmentWebCamp Ukraine 2016: Instant messenger with Python. Back-end development
WebCamp Ukraine 2016: Instant messenger with Python. Back-end development
 
A Modern Interface for Data Science on Postgres/Greenplum - Greenplum Summit ...
A Modern Interface for Data Science on Postgres/Greenplum - Greenplum Summit ...A Modern Interface for Data Science on Postgres/Greenplum - Greenplum Summit ...
A Modern Interface for Data Science on Postgres/Greenplum - Greenplum Summit ...
 
Sebastian Hellmann
Sebastian HellmannSebastian Hellmann
Sebastian Hellmann
 
Scala Days Highlights | BoldRadius
Scala Days Highlights | BoldRadiusScala Days Highlights | BoldRadius
Scala Days Highlights | BoldRadius
 
Nikhil summer internship 2016
Nikhil   summer internship 2016Nikhil   summer internship 2016
Nikhil summer internship 2016
 
Red hat infrastructure for analytics
Red hat infrastructure for analyticsRed hat infrastructure for analytics
Red hat infrastructure for analytics
 
The Semantic Web and Drupal 7 - Loja 2013
The Semantic Web and Drupal 7 - Loja 2013The Semantic Web and Drupal 7 - Loja 2013
The Semantic Web and Drupal 7 - Loja 2013
 
KEDL DBpedia 2019
KEDL DBpedia  2019KEDL DBpedia  2019
KEDL DBpedia 2019
 
Big data @ Hootsuite analtyics
Big data @ Hootsuite analtyicsBig data @ Hootsuite analtyics
Big data @ Hootsuite analtyics
 
Python in Industry
Python in IndustryPython in Industry
Python in Industry
 
BISSA: Empowering Web gadget Communication with Tuple Spaces
BISSA: Empowering Web gadget Communication with Tuple SpacesBISSA: Empowering Web gadget Communication with Tuple Spaces
BISSA: Empowering Web gadget Communication with Tuple Spaces
 

Recently uploaded

(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
Scintica Instrumentation
 
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
Health Advances
 
platelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptxplatelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptx
muralinath2
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Ana Luísa Pinho
 
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
University of Maribor
 
Richard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlandsRichard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlands
Richard Gill
 
Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
Columbia Weather Systems
 
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
muralinath2
 
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Erdal Coalmaker
 
Cancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate PathwayCancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate Pathway
AADYARAJPANDEY1
 
ESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptxESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptx
muralinath2
 
erythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptxerythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptx
muralinath2
 
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
ssuserbfdca9
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
ChetanK57
 
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Sérgio Sacani
 
GBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram StainingGBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram Staining
Areesha Ahmad
 
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
NathanBaughman3
 
Nutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technologyNutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technology
Lokesh Patil
 
Structures and textures of metamorphic rocks
Structures and textures of metamorphic rocksStructures and textures of metamorphic rocks
Structures and textures of metamorphic rocks
kumarmathi863
 
Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
silvermistyshot
 

Recently uploaded (20)

(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...
 
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...The ASGCT Annual Meeting was packed with exciting progress in the field advan...
The ASGCT Annual Meeting was packed with exciting progress in the field advan...
 
platelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptxplatelets- lifespan -Clot retraction-disorders.pptx
platelets- lifespan -Clot retraction-disorders.pptx
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
 
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
 
Richard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlandsRichard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlands
 
Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
 
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
Circulatory system_ Laplace law. Ohms law.reynaults law,baro-chemo-receptors-...
 
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
 
Cancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate PathwayCancer cell metabolism: special Reference to Lactate Pathway
Cancer cell metabolism: special Reference to Lactate Pathway
 
ESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptxESR_factors_affect-clinic significance-Pathysiology.pptx
ESR_factors_affect-clinic significance-Pathysiology.pptx
 
erythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptxerythropoiesis-I_mechanism& clinical significance.pptx
erythropoiesis-I_mechanism& clinical significance.pptx
 
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
4. An Overview of Sugarcane White Leaf Disease in Vietnam.pdf
 
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
 
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...
 
GBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram StainingGBSN- Microbiology (Lab 3) Gram Staining
GBSN- Microbiology (Lab 3) Gram Staining
 
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
Astronomy Update- Curiosity’s exploration of Mars _ Local Briefs _ leadertele...
 
Nutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technologyNutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technology
 
Structures and textures of metamorphic rocks
Structures and textures of metamorphic rocksStructures and textures of metamorphic rocks
Structures and textures of metamorphic rocks
 
Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
 

Open Chemistry, JupyterLab and data: Reproducible quantum chemistry

  • 1. Reproducible Quantum Chemistry Dr. Marcus D. Hanwell @mhanwell Technical Leader American Chemical Society Orlando, FL 31 March, 2019
  • 2. What Is Open Chemistry? ● Umbrella of related projects to coordinate and group ○ Focus on 3-clause BSD permissively licensed projects ○ Aims for more complete solution ● Initially three related projects ○ Avogadro 2 - editor, visualization, interaction with small number of molecules ○ MoleQueue - running computational jobs, abstracting local and remote execution ○ MongoChem - database for interacting with many molecules, summarizing data, informatics ● Evolved over the years but still retains many of those goals ○ GitHub organization with 35 repositories at the last count ● Umbrella organization in Google Summer of Code ○ Four years, with 3, 7, 7, and TBD students over a broad range of projects ○ Hope to continue this and other community engagement activities https://openchemistry.org/
  • 3. Why Jupyter? ● Supports interactive analysis while preserving the analytic steps​ ○ Preserves much of the provenance​ ● Familiar environment and language​ ○ Many are already familiar with the environment​ ○ Python is the language of scientific computing​ ● Simple extension mechanism​ ○ Particularly with JupyterLab​ ○ Allows for complex domain specific visualization​ ● Vibrant ecosystem and community​ ​
  • 4. Open Chemistry, Avogadro, Jupyter and Web ● Making data more accessible ● Federated, open data repositories ● Modern HTML5 interfaces ● JSON data format for NWChem data as a prototype, add to other QM codes ● What about working with the data? ● Can we have chemistry from desktop-to-phone ○ Create data, upload, organize ○ Search and analyze data ○ Share data - email, social media, publications ● What if we tied a data server to a Jupyter notebook? ● Can we make data a first class citizen in modern workflows?
  • 5.
  • 6.
  • 7. Increased Reusability ● Benefit from a huge number of open source packages/projects ● Quantum chemistry codes ○ NWChem, Psi4, ... ● Open source libraries/utilities ○ Avogadro, Open Babel, cclib, RDKit, ... ● Visualization, charting, etc ○ vtk.js, 3DMol.js, D3, plotly, matplotlib, ... ● Web frameworks ○ React, stencil.js, npm, ... ● Languages ○ C++, Python, JavaScript, TypeScript, ... ● Containers ○ Docker, singularity, shifter, ... Also version control such as git, continuous integration such as CircleCI, build systems such as CMake, project hosting such as GitHub, hardware accelerated rendering such as WebGL, queuing systems like grid engine, semantic data stores like Jena, format standards such as JSON, MessagePack, HDF5, XML, HTTP, RESTful web service standards, servers such as nginx, CherryPy, Flask, and many other components that are used directly or gave useful input
  • 8. Increased Reusability ● Developed on GitHub under permissive OSI-approved licenses ○ Industry standard 3-clause BSD and Apache 2 mainly ● Web widgets using stencil.js to offer web tags ● Binary wheels for Python wrapped Avogadro core ○ pip install avogadro ● Pip installable Python modules for standard functions ○ pip install openchemistry ● JupyterLab extensions that can be installed locally ● Binder for “live” notebooks hosted in cloud containers ● Quantum codes and machine learning models in Docker containers ● Establishing data standards for reliable data exchange
  • 9. Approach and Philosophy ● Data is the core of the platform ○ Start with a simple but powerful date model and data server ● RESTful APIs are ubiquitous ○ Use from notebooks, apps, command line, desktop, etc ● Jupyter notebooks for interactive analysis ○ High level domain specific Python API within the notebooks ● Web application ○ Authentication, access control, management tasks ○ Launching, searching, managing notebooks ○ Interact with data outside of the notebook
  • 11.
  • 12.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 20.
  • 23. Reproducibility for Chemical-Physics Data ● Dream - share results like we can currently share code ● Links to interactive pages displaying data ● Those pages link to workflows/Jupyter notebooks ● From input geometry/molecule through to final figure ● Docker containers offer known, reproducible binary ○ Metadata has input parameters, container ID, etc ● Aid reproducibility, machine learning, and education ● Federate access, offer full worked examples - editable!
  • 24. Docker Containers for Chemical-Physics ● Developed three containers so far to serve the platform ○ NWChem and Psi4 for computational chemistry ○ ChemML for machine learning ● These containers are self-contained workflow tools ○ Take JSON and input geometry ○ Use a Python-based execution script ○ Output JSON and optionally all output logs/data ● Run using Docker, Singularity, soon Shifter on AWS, locally, NERSC ● Simple contract making it easy to add more codes to the platform ○ Take some standard input, translate for your code, translate to standard output ○ Get workflow management, integration with Jupyter, visualization, ... ● The Dockerfile has build instructions, DockerHub hosts images
  • 26. Running a Psi4 Docker Container ● Can be run independently of the framework ● docker run -v $(pwd):/data openchemistry/psi4:latest ○ -g /data/geometry.xyz ○ -p /data/parameters.json ○ -o /data/out.cjson ○ -s /data/scratch ● Runs a Python driver script that interprets switches ● Perform input/output translation, input generation, etc ● Packages a code for use in a larger workflow
  • 27. Running a NWChem Docker Container ● Can be run independently of the framework ● docker run -v $(pwd):/data openchemistry/nwchem:latest ○ -g /data/geometry.xyz ○ -p /data/parameters.json ○ -o /data/out.cjson ○ -s /data/scratch ● Runs a Python driver script that interprets switches ● Perform input/output translation, input generation, etc ● Packages a code for use in a larger workflow
  • 28. Export to Binder ● Goes beyond simply showing the static notebook ● Specific GitHub repository layout ○ Install custom Python modules ○ Install JupyterLab extensions ● Service builds a container on the fly ● Can click on a link and run the example container http://mybinder.org/v2/gh/openchemistry/jupyter-examples/master?urlpath=lab/tree/caffeine.ipynb
  • 30. Machine Learning ● What happens after your model is trained and published? ● Can we treat machine learning models like other codes making predictions? ● Lots of new moving parts that need to managed ○ The actual machine learning code, possible accelerator access, etc ○ The trained model, loading it, executing it reproducibly ○ Generation of relevant descriptors as part of the input ○ Extracting output, storing, displaying, and visualizing data ● Starts to share a number of commonalities with other simulations ● Important differences too ○ Narrower focus for most models ○ Possibility to augment trained models, create derived models
  • 31. Running ChemML in a Jupyter Notebook
  • 32. Data Mining ● When running calculations all data, metadata, workflows are captured ● Creation of a structured data store with a friendly frontend ● Possible to perform queries and perform analytics on the data generated ● Machine learning can feed off of this data ○ Reuse the same infrastructure to initiate and generate new data ○ Comparison of predicted data to computational codes, experimental data ○ Use of a familiar JupyterLab interface ● Augmenting the notebook with a data server that can access compute ○ Notebook acts as initiator for large jobs ○ Returning to the notebook later to check on progress ● Independent RESTful APIs, web frontend, batch export of data
  • 33. Chemical JSON ● Developed to support projects (~2011) ● Stores structure, geometry, identifiers, descriptors, other useful data ● Benefits: ○ More compact than XML/CML ○ Native to MongoDB, JSON-RPC, REST ○ Easily converted to binary representation ● Now features basis sets, MOs, sets ● MessagePack a good option for binary ● Maps easily to HDF5 binary data store ● MolSSI JSON schema collaboration
  • 34. Papers and a Little History on Chemical JSON ● Quixote collaboration with Peter Murray-Rust (2011) ○ “The Quixote project: Collaborative and Open Quantum Chemistry data management in the Internet age”, https://doi.org/10.1186/1758-2946-3-38 ● Early work in CML with NWChem and Avogadro (2013) ○ “From data to analysis: linking NWChem and Avogadro with the syntax and semantics of Chemical Markup Language” https://doi.org/10.1186/1758-2946-5-25 ● Later moved to JSON, RESTful API, visualization (2017) ○ “Open chemistry: RESTful web APIs, JSON, NWChem and the modern web application” ○ https://doi.org/10.1186/s13321-017-0241-z ● Interested in Linked Data, JSON-LD, and how they might be layered on top ● Use of BSON, HDF5, and related technologies for binary data ● BSD licensed reference implementations
  • 35. Pillars of Phase II SBIR Project 1. Data and metadata ○ JSON, JSON-LD, HDF5 and semantic web 2. Server platform ○ RESTful APIs, computational chemistry, data, machine learning, HPC/cloud, and triple store 3. Jupyter integration ○ Computational chemistry, data, machine learning, query, analytics, and data visualization 4. Web application ○ Management interfaces, single-page interface, notebook/data browser, and search 5. Avogadro and local Python ○ Python shell integration, extension of Avogadro to use server interface, editing data on server Regular automated software deployments, releases with Docker containers
  • 36. Closing Thoughts ● Nearly halfway through the Phase II project ● Data and software are both central and core to the platform ● Highly reusable through licensing, modular nature, data standards, containers ● Augmented by abstracted access to compute resources ● Open source, developing entry points for customization and extension ● Building on best-of-breed open source community projects ● Extending to better support the chemistry community ○ Just at the start of making machine learning and data mining first class citizens ● User friendly interfaces, Python at the core, visualization, data analytics ● SBIR funding from DOE Office of Science contract DE-SC0017193 ○ Collaborating with Bert de Jong at Berkeley Lab and Johannes Hachmann at SUNY Buffalo