SlideShare a Scribd company logo
The PubChemQC Project 
A big data construction by first-principles 
calculations of molecules 
中田真秀(NAKATA Maho) 
ACCC RIKEN 
maho@riken.jp 
2014/12/3 10:35-11:05 
JST CREST International Symposium on Post 
Petescale System Software
Background 
• Atoms and molecules are all composed of matter. 
• The dream of theoretical chemist: do chemistry 
without experiment! 
• On computers  
• We treat big data in chemistry! 
– Chemical space is really huge! 
• The number of candidates for drugs 
1060 
http://onlinelibrary.wiley.com/doi/10.1002/wcms.1104/ 
abstract) 
• Cf. Exa: 1018
Current status of computational 
chemistry 
• Relatively good agreements with experiments. 
• Can explain nature in many cases. 
– Many good quantum chemistry programs are 
available! 
– “DFT B3LYP 6-31G*” calculations rule! 
• We want to lead chemistry 
– We only explain what happened.
Difference between experiment and 
calculation/theory 
• Finding interesting phenomena or problem 
– How we convert from CO2 to O2? N2+H2 to NH3? 
– How to synthesize a compound? 
• Design a key chemical reaction. 
• Calculations 
• Experiments 
– Analyze 
• Analysis of results 
• Propose new experiments 
Only One Difference
Difference between experiment and 
calculation/theory 
• No difference as science 
• Most important thing is curiosity! 
New insights from 
big data and 
my sensitivity! 
Unfortunately, not so many easy-to-use 
big data for chemistry
Googling molecule 
+ 
Give you recommended molecules!
What are needed for Googling molecule? 
1. Types, kinds, variety of molecules 
– # of molecules are infinity; but cover important ones 
2. Required properties of molecules 
– Molecular structure, energy, UV excitation energy, 
dipole moment 
3. Getting properties of molecules by calculation? 
– Accuracy of calculation, and computer resources… 
4. Coding or Encoding molecule 
– IUPAC nomenclature is not suitable 
– Do not think about graph theory
Databases for lists of molecules 
• PubChem: 50,000,000 molecules listed, made by NIH, 
public domain, no curating (imported from catalogs, 
etc), can obtain via ftp. 
• ChemSpider : 28,000,000 entries, better curating, no 
ftp. Restricted for redistribution, download 
• Web-GDB13 : 900,000,000 entries, just generated by 
combinatorics. No 
• Zinc, CheMBL, DrugBank … 
• CAS : 70,000,000 molecules, proprietary 
• Nikkaji: 6,000,000, proprietary 
We use for source of molecules
The PubChem
Ex. A molecule listed in PubChem
Database for molecular properties by 
experiments 
• We must do some experiments for obtaining 
molecular properties. 
– No free comprehensive database is known so far. 
– Pharmaceutical companies do O(1,000,000) 
experiments for high throughput screening. 
• Experiments cost huge! 
– Time consuming, large facilities, costs, hazardous 
We do not do experiments!
Database for molecular properties by computer 
calculation 
• Golden Standard method “Density functional 
theory (B3LYP functional) + 6-31g(d) basis set” 
– Accuracy is quite satisfactory (1-10kcal/mol) for 
biological systems, organic chemistry. 
– Good implementations are available. 
– Costs less (fast, just super computer, no hazardous) 
– Time for calculations becomes less 
• Intel Core i7 (esp. SandyBridge) is very fast. 
• Still we need huge resources, though. 
We calculate by computer instead!
What is a molecule? 
No rigorous definition for a molecule 
3D coordinates 
Hard to understand 
but regours 
Easy to understand 
But many coner cases 
Propionaldehyde 
wavefunction 
Common name 
IUPAC 
nomencleature 
Structure 
Wikipediaより
What is a molecule? 
• No rigorous definition for “what is a molecule” 
• nomenclature 
– 3D coordinates for nucleus 
– Structural formula 
– IUPAC nomenclature 
– Higher abstraction or less abstraction? 
• Better molecular encoding method? 
– Easy to understand for human 
– Easy to understand for computer as well 
– Can describe most cases, and less corner cases. 
– Compromise between dream and reality
Encoding molecule : SMILES 
Encoding molecule 
IUPAC nomenclature 
tert-butyl N-[(2S,3S,5S)-5-[[4-[(1-benzyltetrazol-5-yl) 
methoxy]phenyl]methyl]-3-hydroxy-6-[[(1S,2R)- 
2-hydroxy-2,3-dihydro-1H-inden-1-yl]amino]- 
6-oxo-1-phenylhexan-2-yl]carbamate 
We can encode molecule 
• SMILES 
CN(C)CCOC12CCC(C3C1CCCC3)C4=CC=CC=C24 
• InChI Made by IUPAC 
InChI=1S/C20H29NO/c1-21(2)13-14-22-20-12-11 
-15(16-7-3-5-9-18(16)20)17-8-4-6-10-19(17)20/ 
h3,5,7,9,15,17,19H,4,6,8,10-14H2,1-2H3 
… 
SMILES is a good encoding method for molecules
What is SMILES? 
• Simplified Molecular Input Line Entry System 
– A linear representation of molecule using ASCII. 
– Conformation is also encoded 
– Human readable, and also machine readable. 
– Almost one-to-one mapping between a molecule and 
SMILES via universal SMILES 
• David Weininger at USEPA Mid-Continent Ecology Division Laboratory invented SMILES 
• InChI by IUPAC 
– International Chemical Identifier : open standard (non proprietary) 
– NM O’Boyle invented “Universal SMILES” via InChI
Example by SMILES 
http://en.wikipedia.org/wiki/SMILES 
分子構造SMILES 
Nitrogen molecule N≡N N#N 
copper sulfate Cu2+ SO42- [Cu+2].[O-]S(=O)(=O)[O-] 
oenanthotoxin CCC[C@@H](O)CCC=CC=C 
C#CC#CC=CCO 
Vitamin B1 OCCc1c(C)[n+](=cs1)Cc2cnc(C 
)nc(N)2 
Aflatoxin B1 O1C=C[C@H]([C@H]1O2)c3c 
2cc(OC)c4c3OC(=O)C5=C4CC 
C(=O)5
Some corner cases 
Two different SMILES for Ferrocene 
• C12C3C4C5C1[Fe]23451234C5C1C2C3C45 
• [CH-]1C=CC=C1.[CH-]1C=CC=C1.[Fe+2]
Now its my turn
Construction of ab initio chemical 
database 
• Molecular information is from PubChem 
• Properties are calculated from the first principle using 
computer 
– Many program packages are available 
– DFT (B3LYP) 
– 6-31G(d) basis set and geometry optimization 
– Excited states calculation by TD-DFT 6-31G+(d) 
– Best for organic molecules or bio molecules 
• Molecular encoding : SMILES / InChI 
• Huge computer resources 
• Dream come true 
– Google like search engine for chemistry
The PubChemQC Project 
• http://pubchemqc.riken.jp/ 
• A open database for molecules 
– Public domain 
• Ab initio (The first principle) calculation of 
molecular properties of PubChem 
• 2014/1/15: 13,000 molecules 
• 2014/7/29 : 155,792 molecules 
• 2014/10/30 : 906,798 molecules 
• 2014/12/3 : 1,137,286 molecules
The PubChemQC project 
http://pubchemqc.riken.jp/ 
WIP: no search engine, just data
PubChemQC 
http://pubchemqc.riken.jp/
PubChemQC 
http://pubchemqc.riken.jp/
Related works 
• Related works 
– NIST Web Book 
• http://webbook.nist.gov/chemistry/ 
• Small numbers of molecules. Comparing many methods 
– Harvard Clean Energy Project 
• http://cleanenergy.molecularspace.org/ 
• 25,000,000 (?), molecules for photo devices made by 
combinatrics 
– Sugimoto et al :2013CBI symposium poster 
• Almost same as our database, currently not open to the 
public(now??)
How we do? 
• Generate initial 3D conformation by OpenBABEL 
– SDF contains 3D conformation but we don’t use. 
– OpenBABEL –h (add hydrogen) --gen3d (generation of 3d 
coordinate) 
• Ab initio calculation by GAMESS+firefly 
– Using Gaussian can lead to a political problem(?) 
– PM3 optimization 
– Hartree-Fock/STO-6G geometry optimization 
– Firefly+GAMESS geometry optimization in B3LYP/6-31G* 
– Ten excitation energies by TDDFT/6-31G+* (no geom 
optimization)
How we do? 
• Heavily using OpenBABEL 
• Extraction Molecular information 
– Sort by molecular weight of PubChem compouds 
– OpenBABEL 
• Encoded by SMILES 
– Isomeric smiles: 3D conformation retained 
– OC[C@@H](O1)[C@@H](O)[C@H](O)[C@@H](O)[C@ 
@H](O)1 
– CCC[C@@H](O)CCC=CC=CC#CC#CC=CCO 
– CC(=O)OCCC(/C)=CC[C@H](C(C)=C)CCC=C
Our way to pubchem Compound to 
quantum chemistry calculation 
aflatoxin 
O1C=C[C@H]([C@H]1O2)c3c2cc(OC)c4c3OC(=O)C5=C4CCC(=O)5 
Ab initio calculation by 
OpenBABEL
Final results will be 
• Uploaded to http://pubchemqc.riken.jp/ 
• Currently we upload 
– input file (ground / excited state) 
– Output file (ground / excited state) 
– Final geometry in Mol file
Scaling of computation 
• Embarrassingly parallel for each molecule 
• Very roughly speaking, required time for 
calculation scales like N^4 
– N : molecular weight 
• Problems are very hard (complexity theory) 
– Hartree-Fock calculation 
– DFT (b3lyp) calculation 
– geometry optimization 
• Practically many molecules can be solved 
efficiently
Computer Resources 
• RICC : Intel Xeon 5570 Westmere, 2.93GHz 8 
cores/node) x 1000 
– 1000-10000 molecules/day (MW 160) 
– Heavily depend on conditions of other users 
– Time limit: 8 hours 
• Quest : Intel Core2 duo (1.6GHz/node) x 700 
– 3000-8000 molecules / day (MW 160) 
– 100-1000 molecules / day (MW 200-300) 
– Time limit: 20 hours 
• Some compounds fail to calculate are ignored for 
this time.
Computer Resources 
• Storage 
– Approx. 500GB for 1,000,000 molecules (xz 
compressed) 
– Approx. 20 TB for 40,000,000 molecules (xz 
compressed)
Molecular weight and Lipinski Rule 
• Lipinski’s five rule (Pfizer's rule of five): rule of 
thumb for drug discovery 
• No more than 5 hydrogen bond donors 
• Not more than 10 hydrogen bond acceptors 
• A molecular mass less than 500 daltons 
• An octanol-water partition coefficient log P not greater than 5 
• Molecular weight should be smaller than 500 is 
very good for computational chemistry 
– For routine calculations without experimental data 
other than molecular formula 
– If larger than 500, secondary or higher structure 
becomes important. E.g., protein
Molecular Weight distribution at 
PubChem 
Lipinski limit MW=500 
We are still here 
30,000,000 molecules 
(excluding mixtures)
How long it will take to finish? 
• For drug design, we need to calculate all 
molecules of MW < 500 
• Total 30,000,000 molecules 
– This number may increase in the future 
• Current (2014/12/4) 1,100,000 molecules 
– Only 3% 
• 10,000 molecules/day -> 8.2years
How long it will take to finish? 
• 10+ years? No, maybe far less. 
• 25 years ago (1990) computers are so slow 
– Even ab initio calculations are very difficult on 
486DX@25MHz or 
68000@10MHz
Outlook, prospect, hope… 
• Far better in silico screening 
– Less or no experiment is necessary 
• Even more faster calculation using machine learning 
– 10,000 molecules / second ? 
– Using our data as learning set. 
– Not difficult for bio or organic molecules 
– Far better initial guess 
• Database for chemical reaction 
– Precise calculation is required 
– GRRM method + machine learning (?) 
• Geometry optimization for Protein (PDB) 
– Only X ray crystal structures are available 
http://pubchemqc.riken.jp/
Difficulties in this project 
• Parameters needed for calculations varies by 
molecules 
• Properties can be different by initial guess 
• Computer Resources 
– Raspberry Pi? NVIDIA Jetson? Bonic? 
• Molecular encoding never ends 
– SMILES or InChI is not complete 
– Some corner cases may be chemically interesting.

More Related Content

What's hot

Towards Exascale Simulations of Stellar Explosions with FLASH
Towards Exascale  Simulations of Stellar  Explosions with FLASHTowards Exascale  Simulations of Stellar  Explosions with FLASH
Towards Exascale Simulations of Stellar Explosions with FLASH
Ganesan Narayanasamy
 
Runtime Performance Optimizations for an OpenFOAM Simulation
Runtime Performance Optimizations for an OpenFOAM SimulationRuntime Performance Optimizations for an OpenFOAM Simulation
Runtime Performance Optimizations for an OpenFOAM Simulation
Fisnik Kraja
 
News from NNPDF: new data and fits with intrinsic charm
News from NNPDF: new data and fits with intrinsic charmNews from NNPDF: new data and fits with intrinsic charm
News from NNPDF: new data and fits with intrinsic charm
juanrojochacon
 
20190314 cern register v3
20190314 cern register v320190314 cern register v3
20190314 cern register v3
Tim Bell
 
強化学習の分散アーキテクチャ変遷
強化学習の分散アーキテクチャ変遷強化学習の分散アーキテクチャ変遷
強化学習の分散アーキテクチャ変遷
Eiji Sekiya
 
The World Wide Distributed Computing Architecture of the LHC Datagrid
The World Wide Distributed Computing Architecture of the LHC DatagridThe World Wide Distributed Computing Architecture of the LHC Datagrid
The World Wide Distributed Computing Architecture of the LHC DatagridSwiss Big Data User Group
 
第13回 配信講義 計算科学技術特論A(2021)
第13回 配信講義 計算科学技術特論A(2021)第13回 配信講義 計算科学技術特論A(2021)
第13回 配信講義 計算科学技術特論A(2021)
RCCSRENKEI
 
A comparison of molecular dynamics simulations using GROMACS with GPU and CPU
A comparison of molecular dynamics simulations using GROMACS with GPU and CPUA comparison of molecular dynamics simulations using GROMACS with GPU and CPU
A comparison of molecular dynamics simulations using GROMACS with GPU and CPU
Alex Camargo
 
IIBMP2019 講演資料「オープンソースで始める深層学習」
IIBMP2019 講演資料「オープンソースで始める深層学習」IIBMP2019 講演資料「オープンソースで始める深層学習」
IIBMP2019 講演資料「オープンソースで始める深層学習」
Preferred Networks
 
PyTorch 튜토리얼 (Touch to PyTorch)
PyTorch 튜토리얼 (Touch to PyTorch)PyTorch 튜토리얼 (Touch to PyTorch)
PyTorch 튜토리얼 (Touch to PyTorch)
Hansol Kang
 
HTCC poster for CERN Openlab opendays 2015
HTCC poster for CERN Openlab opendays 2015HTCC poster for CERN Openlab opendays 2015
HTCC poster for CERN Openlab opendays 2015
Karel Ha
 
Implementation of linear regression and logistic regression on Spark
Implementation of linear regression and logistic regression on SparkImplementation of linear regression and logistic regression on Spark
Implementation of linear regression and logistic regression on Spark
Dalei Li
 
Deep learning for molecules, introduction to chainer chemistry
Deep learning for molecules, introduction to chainer chemistryDeep learning for molecules, introduction to chainer chemistry
Deep learning for molecules, introduction to chainer chemistry
Kenta Oono
 
PFN Summer Internship 2019 / Kenshin Abe: Extension of Chainer-Chemistry for ...
PFN Summer Internship 2019 / Kenshin Abe: Extension of Chainer-Chemistry for ...PFN Summer Internship 2019 / Kenshin Abe: Extension of Chainer-Chemistry for ...
PFN Summer Internship 2019 / Kenshin Abe: Extension of Chainer-Chemistry for ...
Preferred Networks
 
쉽게 설명하는 GAN (What is this? Gum? It's GAN.)
쉽게 설명하는 GAN (What is this? Gum? It's GAN.)쉽게 설명하는 GAN (What is this? Gum? It's GAN.)
쉽게 설명하는 GAN (What is this? Gum? It's GAN.)
Hansol Kang
 

What's hot (15)

Towards Exascale Simulations of Stellar Explosions with FLASH
Towards Exascale  Simulations of Stellar  Explosions with FLASHTowards Exascale  Simulations of Stellar  Explosions with FLASH
Towards Exascale Simulations of Stellar Explosions with FLASH
 
Runtime Performance Optimizations for an OpenFOAM Simulation
Runtime Performance Optimizations for an OpenFOAM SimulationRuntime Performance Optimizations for an OpenFOAM Simulation
Runtime Performance Optimizations for an OpenFOAM Simulation
 
News from NNPDF: new data and fits with intrinsic charm
News from NNPDF: new data and fits with intrinsic charmNews from NNPDF: new data and fits with intrinsic charm
News from NNPDF: new data and fits with intrinsic charm
 
20190314 cern register v3
20190314 cern register v320190314 cern register v3
20190314 cern register v3
 
強化学習の分散アーキテクチャ変遷
強化学習の分散アーキテクチャ変遷強化学習の分散アーキテクチャ変遷
強化学習の分散アーキテクチャ変遷
 
The World Wide Distributed Computing Architecture of the LHC Datagrid
The World Wide Distributed Computing Architecture of the LHC DatagridThe World Wide Distributed Computing Architecture of the LHC Datagrid
The World Wide Distributed Computing Architecture of the LHC Datagrid
 
第13回 配信講義 計算科学技術特論A(2021)
第13回 配信講義 計算科学技術特論A(2021)第13回 配信講義 計算科学技術特論A(2021)
第13回 配信講義 計算科学技術特論A(2021)
 
A comparison of molecular dynamics simulations using GROMACS with GPU and CPU
A comparison of molecular dynamics simulations using GROMACS with GPU and CPUA comparison of molecular dynamics simulations using GROMACS with GPU and CPU
A comparison of molecular dynamics simulations using GROMACS with GPU and CPU
 
IIBMP2019 講演資料「オープンソースで始める深層学習」
IIBMP2019 講演資料「オープンソースで始める深層学習」IIBMP2019 講演資料「オープンソースで始める深層学習」
IIBMP2019 講演資料「オープンソースで始める深層学習」
 
PyTorch 튜토리얼 (Touch to PyTorch)
PyTorch 튜토리얼 (Touch to PyTorch)PyTorch 튜토리얼 (Touch to PyTorch)
PyTorch 튜토리얼 (Touch to PyTorch)
 
HTCC poster for CERN Openlab opendays 2015
HTCC poster for CERN Openlab opendays 2015HTCC poster for CERN Openlab opendays 2015
HTCC poster for CERN Openlab opendays 2015
 
Implementation of linear regression and logistic regression on Spark
Implementation of linear regression and logistic regression on SparkImplementation of linear regression and logistic regression on Spark
Implementation of linear regression and logistic regression on Spark
 
Deep learning for molecules, introduction to chainer chemistry
Deep learning for molecules, introduction to chainer chemistryDeep learning for molecules, introduction to chainer chemistry
Deep learning for molecules, introduction to chainer chemistry
 
PFN Summer Internship 2019 / Kenshin Abe: Extension of Chainer-Chemistry for ...
PFN Summer Internship 2019 / Kenshin Abe: Extension of Chainer-Chemistry for ...PFN Summer Internship 2019 / Kenshin Abe: Extension of Chainer-Chemistry for ...
PFN Summer Internship 2019 / Kenshin Abe: Extension of Chainer-Chemistry for ...
 
쉽게 설명하는 GAN (What is this? Gum? It's GAN.)
쉽게 설명하는 GAN (What is this? Gum? It's GAN.)쉽게 설명하는 GAN (What is this? Gum? It's GAN.)
쉽게 설명하는 GAN (What is this? Gum? It's GAN.)
 

Viewers also liked

Dynamics CRM 2013 Customization and Configuration
Dynamics CRM 2013 Customization and ConfigurationDynamics CRM 2013 Customization and Configuration
Dynamics CRM 2013 Customization and Configuration
Satish
 
Impacto De Las Tics En La Cultura De La Mediacion A Distancia Para La Educaci...
Impacto De Las Tics En La Cultura De La Mediacion A Distancia Para La Educaci...Impacto De Las Tics En La Cultura De La Mediacion A Distancia Para La Educaci...
Impacto De Las Tics En La Cultura De La Mediacion A Distancia Para La Educaci...GUILLERMO MOLINA JARA
 
B100 board presentation
B100 board presentationB100 board presentation
B100 board presentation
rjoana
 
Introducción a la biotecnología
Introducción a la biotecnologíaIntroducción a la biotecnología
Introducción a la biotecnologíabkt_6
 
Sånn akkurat passe mye arkitektur
Sånn akkurat passe mye arkitekturSånn akkurat passe mye arkitektur
Sånn akkurat passe mye arkitektur
Vegard Hartmann
 
Comentarios de Piedad Córdoba a columna de Vladdo
Comentarios de Piedad Córdoba a columna de VladdoComentarios de Piedad Córdoba a columna de Vladdo
Comentarios de Piedad Córdoba a columna de Vladdo
Poder Ciudadano
 
Tracxn Research — Online Photos Landscape, October 2016
Tracxn Research — Online Photos Landscape, October 2016Tracxn Research — Online Photos Landscape, October 2016
Tracxn Research — Online Photos Landscape, October 2016
Tracxn
 
9 icms instructions_for_authors
9 icms instructions_for_authors9 icms instructions_for_authors
9 icms instructions_for_authors
a2shafi
 
Ipsos Consumer Confidence Index April 2013
Ipsos Consumer Confidence Index April 2013Ipsos Consumer Confidence Index April 2013
Ipsos Consumer Confidence Index April 2013
Ipsos UK
 
Online marketing and distribution
Online marketing and distributionOnline marketing and distribution
Online marketing and distribution
Việt Long Plaza
 
인천펜션 노보텔호텔
인천펜션 노보텔호텔인천펜션 노보텔호텔
인천펜션 노보텔호텔
jdhfrter
 

Viewers also liked (12)

Dynamics CRM 2013 Customization and Configuration
Dynamics CRM 2013 Customization and ConfigurationDynamics CRM 2013 Customization and Configuration
Dynamics CRM 2013 Customization and Configuration
 
Impacto De Las Tics En La Cultura De La Mediacion A Distancia Para La Educaci...
Impacto De Las Tics En La Cultura De La Mediacion A Distancia Para La Educaci...Impacto De Las Tics En La Cultura De La Mediacion A Distancia Para La Educaci...
Impacto De Las Tics En La Cultura De La Mediacion A Distancia Para La Educaci...
 
B100 board presentation
B100 board presentationB100 board presentation
B100 board presentation
 
Introducción a la biotecnología
Introducción a la biotecnologíaIntroducción a la biotecnología
Introducción a la biotecnología
 
Sånn akkurat passe mye arkitektur
Sånn akkurat passe mye arkitekturSånn akkurat passe mye arkitektur
Sånn akkurat passe mye arkitektur
 
Comentarios de Piedad Córdoba a columna de Vladdo
Comentarios de Piedad Córdoba a columna de VladdoComentarios de Piedad Córdoba a columna de Vladdo
Comentarios de Piedad Córdoba a columna de Vladdo
 
AcreditacióN 9 SesióN
AcreditacióN 9 SesióNAcreditacióN 9 SesióN
AcreditacióN 9 SesióN
 
Tracxn Research — Online Photos Landscape, October 2016
Tracxn Research — Online Photos Landscape, October 2016Tracxn Research — Online Photos Landscape, October 2016
Tracxn Research — Online Photos Landscape, October 2016
 
9 icms instructions_for_authors
9 icms instructions_for_authors9 icms instructions_for_authors
9 icms instructions_for_authors
 
Ipsos Consumer Confidence Index April 2013
Ipsos Consumer Confidence Index April 2013Ipsos Consumer Confidence Index April 2013
Ipsos Consumer Confidence Index April 2013
 
Online marketing and distribution
Online marketing and distributionOnline marketing and distribution
Online marketing and distribution
 
인천펜션 노보텔호텔
인천펜션 노보텔호텔인천펜션 노보텔호텔
인천펜션 노보텔호텔
 

Similar to The PubChemQC Project

Is 20TB really Big Data?
Is 20TB really Big Data?Is 20TB really Big Data?
Is 20TB really Big Data?
NextMove Software
 
Substructure Search Face-off
Substructure Search Face-offSubstructure Search Face-off
Substructure Search Face-off
NextMove Software
 
Digitizing documents to provide a public spectroscopy database
Digitizing documents to provide a public spectroscopy databaseDigitizing documents to provide a public spectroscopy database
Digitizing documents to provide a public spectroscopy database
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
HPC Applications of Materials Modeling
HPC Applications of Materials ModelingHPC Applications of Materials Modeling
HPC Applications of Materials Modeling
George Fitzgerald
 
Morgan osg user school 2016 07-29 dist
Morgan osg user school 2016 07-29 distMorgan osg user school 2016 07-29 dist
Morgan osg user school 2016 07-29 dist
ddm314
 
Representing Chemicals Digitally: An overview of Cheminformatics
Representing Chemicals Digitally: An overview of CheminformaticsRepresenting Chemicals Digitally: An overview of Cheminformatics
Representing Chemicals Digitally: An overview of CheminformaticsJames Jeffryes
 
Discovering advanced materials for energy applications (with high-throughput ...
Discovering advanced materials for energy applications (with high-throughput ...Discovering advanced materials for energy applications (with high-throughput ...
Discovering advanced materials for energy applications (with high-throughput ...
Anubhav Jain
 
Efficient matching of multiple chemical subgraphs
Efficient matching of multiple chemical subgraphsEfficient matching of multiple chemical subgraphs
Efficient matching of multiple chemical subgraphs
NextMove Software
 
The physics of computational drug discovery
The physics of computational drug discoveryThe physics of computational drug discovery
The physics of computational drug discovery
Shourjya Sanyal
 
03j_nov18_n2.pptClassification of Parallel Computers.pptx
03j_nov18_n2.pptClassification of Parallel Computers.pptx03j_nov18_n2.pptClassification of Parallel Computers.pptx
03j_nov18_n2.pptClassification of Parallel Computers.pptx
Neeraj Singh
 
Approaches for extraction and digital chromatography of chemical data
Approaches for extraction and digital chromatography of chemical dataApproaches for extraction and digital chromatography of chemical data
Approaches for extraction and digital chromatography of chemical data
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Coordination InChI (2019)
Coordination InChI (2019)Coordination InChI (2019)
Coordination InChI (2019)
Alex Clark
 
Computer-Assisted Structure Elucidation (CloudMet 2017)
Computer-Assisted Structure Elucidation (CloudMet 2017)Computer-Assisted Structure Elucidation (CloudMet 2017)
Computer-Assisted Structure Elucidation (CloudMet 2017)
Christoph Steinbeck
 
Computational Chemistry: From Theory to Practice
Computational Chemistry: From Theory to PracticeComputational Chemistry: From Theory to Practice
Computational Chemistry: From Theory to Practice
David Thompson
 
Overview of cheminformatics
Overview of cheminformaticsOverview of cheminformatics
Overview of cheminformatics
Benjamin Bucior
 
Exploring Practices in Machine Learning and Machine Discovery for Heterogeneo...
Exploring Practices in Machine Learning and Machine Discovery for Heterogeneo...Exploring Practices in Machine Learning and Machine Discovery for Heterogeneo...
Exploring Practices in Machine Learning and Machine Discovery for Heterogeneo...
Ichigaku Takigawa
 
01-10 Exploring new high potential 2D materials - Angioni.pdf
01-10 Exploring new high potential 2D materials - Angioni.pdf01-10 Exploring new high potential 2D materials - Angioni.pdf
01-10 Exploring new high potential 2D materials - Angioni.pdf
OCRE | Open Clouds for Research Environments
 
OpenDiscovery
OpenDiscoveryOpenDiscovery
OpenDiscoverygwprice
 

Similar to The PubChemQC Project (20)

Is 20TB really Big Data?
Is 20TB really Big Data?Is 20TB really Big Data?
Is 20TB really Big Data?
 
Substructure Search Face-off
Substructure Search Face-offSubstructure Search Face-off
Substructure Search Face-off
 
Digitizing documents to provide a public spectroscopy database
Digitizing documents to provide a public spectroscopy databaseDigitizing documents to provide a public spectroscopy database
Digitizing documents to provide a public spectroscopy database
 
HPC Applications of Materials Modeling
HPC Applications of Materials ModelingHPC Applications of Materials Modeling
HPC Applications of Materials Modeling
 
Morgan osg user school 2016 07-29 dist
Morgan osg user school 2016 07-29 distMorgan osg user school 2016 07-29 dist
Morgan osg user school 2016 07-29 dist
 
OOW-IMC-final
OOW-IMC-finalOOW-IMC-final
OOW-IMC-final
 
Representing Chemicals Digitally: An overview of Cheminformatics
Representing Chemicals Digitally: An overview of CheminformaticsRepresenting Chemicals Digitally: An overview of Cheminformatics
Representing Chemicals Digitally: An overview of Cheminformatics
 
Discovering advanced materials for energy applications (with high-throughput ...
Discovering advanced materials for energy applications (with high-throughput ...Discovering advanced materials for energy applications (with high-throughput ...
Discovering advanced materials for energy applications (with high-throughput ...
 
Efficient matching of multiple chemical subgraphs
Efficient matching of multiple chemical subgraphsEfficient matching of multiple chemical subgraphs
Efficient matching of multiple chemical subgraphs
 
The physics of computational drug discovery
The physics of computational drug discoveryThe physics of computational drug discovery
The physics of computational drug discovery
 
1371 silver[1]
1371 silver[1]1371 silver[1]
1371 silver[1]
 
03j_nov18_n2.pptClassification of Parallel Computers.pptx
03j_nov18_n2.pptClassification of Parallel Computers.pptx03j_nov18_n2.pptClassification of Parallel Computers.pptx
03j_nov18_n2.pptClassification of Parallel Computers.pptx
 
Approaches for extraction and digital chromatography of chemical data
Approaches for extraction and digital chromatography of chemical dataApproaches for extraction and digital chromatography of chemical data
Approaches for extraction and digital chromatography of chemical data
 
Coordination InChI (2019)
Coordination InChI (2019)Coordination InChI (2019)
Coordination InChI (2019)
 
Computer-Assisted Structure Elucidation (CloudMet 2017)
Computer-Assisted Structure Elucidation (CloudMet 2017)Computer-Assisted Structure Elucidation (CloudMet 2017)
Computer-Assisted Structure Elucidation (CloudMet 2017)
 
Computational Chemistry: From Theory to Practice
Computational Chemistry: From Theory to PracticeComputational Chemistry: From Theory to Practice
Computational Chemistry: From Theory to Practice
 
Overview of cheminformatics
Overview of cheminformaticsOverview of cheminformatics
Overview of cheminformatics
 
Exploring Practices in Machine Learning and Machine Discovery for Heterogeneo...
Exploring Practices in Machine Learning and Machine Discovery for Heterogeneo...Exploring Practices in Machine Learning and Machine Discovery for Heterogeneo...
Exploring Practices in Machine Learning and Machine Discovery for Heterogeneo...
 
01-10 Exploring new high potential 2D materials - Angioni.pdf
01-10 Exploring new high potential 2D materials - Angioni.pdf01-10 Exploring new high potential 2D materials - Angioni.pdf
01-10 Exploring new high potential 2D materials - Angioni.pdf
 
OpenDiscovery
OpenDiscoveryOpenDiscovery
OpenDiscovery
 

More from Maho Nakata

quantum chemistry on quantum computer handson by Q# (2019/8/4@MDR Hongo, Tokyo)
quantum chemistry on quantum computer handson by Q# (2019/8/4@MDR Hongo, Tokyo)quantum chemistry on quantum computer handson by Q# (2019/8/4@MDR Hongo, Tokyo)
quantum chemistry on quantum computer handson by Q# (2019/8/4@MDR Hongo, Tokyo)
Maho Nakata
 
Lie-Trotter-Suzuki分解、特にフラクタル分解について
Lie-Trotter-Suzuki分解、特にフラクタル分解についてLie-Trotter-Suzuki分解、特にフラクタル分解について
Lie-Trotter-Suzuki分解、特にフラクタル分解について
Maho Nakata
 
LiHのポテンシャルエネルギー曲面 を量子コンピュータで行う Q#+位相推定編
LiHのポテンシャルエネルギー曲面 を量子コンピュータで行う Q#+位相推定編LiHのポテンシャルエネルギー曲面 を量子コンピュータで行う Q#+位相推定編
LiHのポテンシャルエネルギー曲面 を量子コンピュータで行う Q#+位相推定編
Maho Nakata
 
Q#による量子化学計算 : 水素分子の位相推定について
Q#による量子化学計算 : 水素分子の位相推定についてQ#による量子化学計算 : 水素分子の位相推定について
Q#による量子化学計算 : 水素分子の位相推定について
Maho Nakata
 
量子コンピュータの量子化学計算への応用の現状と展望
量子コンピュータの量子化学計算への応用の現状と展望量子コンピュータの量子化学計算への応用の現状と展望
量子コンピュータの量子化学計算への応用の現状と展望
Maho Nakata
 
qubitによる波動関数の虚時間発展のシミュレーション: a review
qubitによる波動関数の虚時間発展のシミュレーション: a reviewqubitによる波動関数の虚時間発展のシミュレーション: a review
qubitによる波動関数の虚時間発展のシミュレーション: a review
Maho Nakata
 
Openfermionを使った分子の計算 part I
Openfermionを使った分子の計算 part IOpenfermionを使った分子の計算 part I
Openfermionを使った分子の計算 part I
Maho Nakata
 
量子コンピュータで量子化学のfullCIが超高速になる(かも
量子コンピュータで量子化学のfullCIが超高速になる(かも量子コンピュータで量子化学のfullCIが超高速になる(かも
量子コンピュータで量子化学のfullCIが超高速になる(かも
Maho Nakata
 
20180723 量子コンピュータの量子化学への応用; Bravyi-Kitaev基底の実装
20180723 量子コンピュータの量子化学への応用; Bravyi-Kitaev基底の実装20180723 量子コンピュータの量子化学への応用; Bravyi-Kitaev基底の実装
20180723 量子コンピュータの量子化学への応用; Bravyi-Kitaev基底の実装
Maho Nakata
 
第11回分子科学 2017/9/17 Pubchemqcプロジェクト
第11回分子科学 2017/9/17 Pubchemqcプロジェクト第11回分子科学 2017/9/17 Pubchemqcプロジェクト
第11回分子科学 2017/9/17 Pubchemqcプロジェクト
Maho Nakata
 
計算化学実習講座:第二回
 計算化学実習講座:第二回 計算化学実習講座:第二回
計算化学実習講座:第二回
Maho Nakata
 
計算化学実習講座:第一回
計算化学実習講座:第一回計算化学実習講座:第一回
計算化学実習講座:第一回
Maho Nakata
 
HOKUSAIのベンチマーク 理研シンポジウム 中田分
HOKUSAIのベンチマーク 理研シンポジウム 中田分HOKUSAIのベンチマーク 理研シンポジウム 中田分
HOKUSAIのベンチマーク 理研シンポジウム 中田分
Maho Nakata
 
為替取引(FX)でのtickdataの加工とMySQLで管理
為替取引(FX)でのtickdataの加工とMySQLで管理為替取引(FX)でのtickdataの加工とMySQLで管理
為替取引(FX)でのtickdataの加工とMySQLで管理
Maho Nakata
 
為替のTickdataをDukascopyからダウンロードする
為替のTickdataをDukascopyからダウンロードする為替のTickdataをDukascopyからダウンロードする
為替のTickdataをDukascopyからダウンロードする
Maho Nakata
 
HPCS2015 pythonを用いた量子化学プログラムの開発と応用
HPCS2015 pythonを用いた量子化学プログラムの開発と応用HPCS2015 pythonを用いた量子化学プログラムの開発と応用
HPCS2015 pythonを用いた量子化学プログラムの開発と応用
Maho Nakata
 
HPCS2015 大規模量子化学計算プログラムSMASHの開発と公開(石村)
HPCS2015 大規模量子化学計算プログラムSMASHの開発と公開(石村)HPCS2015 大規模量子化学計算プログラムSMASHの開発と公開(石村)
HPCS2015 大規模量子化学計算プログラムSMASHの開発と公開(石村)
Maho Nakata
 
3Dプリンタ導入記 タンパク質の模型をプリントする
3Dプリンタ導入記 タンパク質の模型をプリントする3Dプリンタ導入記 タンパク質の模型をプリントする
3Dプリンタ導入記 タンパク質の模型をプリントする
Maho Nakata
 
立教大学化学実験3 SMILESを中心とした高度な分子モデリング 2014/7/1
立教大学化学実験3 SMILESを中心とした高度な分子モデリング 2014/7/1 立教大学化学実験3 SMILESを中心とした高度な分子モデリング 2014/7/1
立教大学化学実験3 SMILESを中心とした高度な分子モデリング 2014/7/1
Maho Nakata
 
The PubchemQC project
The PubchemQC projectThe PubchemQC project
The PubchemQC project
Maho Nakata
 

More from Maho Nakata (20)

quantum chemistry on quantum computer handson by Q# (2019/8/4@MDR Hongo, Tokyo)
quantum chemistry on quantum computer handson by Q# (2019/8/4@MDR Hongo, Tokyo)quantum chemistry on quantum computer handson by Q# (2019/8/4@MDR Hongo, Tokyo)
quantum chemistry on quantum computer handson by Q# (2019/8/4@MDR Hongo, Tokyo)
 
Lie-Trotter-Suzuki分解、特にフラクタル分解について
Lie-Trotter-Suzuki分解、特にフラクタル分解についてLie-Trotter-Suzuki分解、特にフラクタル分解について
Lie-Trotter-Suzuki分解、特にフラクタル分解について
 
LiHのポテンシャルエネルギー曲面 を量子コンピュータで行う Q#+位相推定編
LiHのポテンシャルエネルギー曲面 を量子コンピュータで行う Q#+位相推定編LiHのポテンシャルエネルギー曲面 を量子コンピュータで行う Q#+位相推定編
LiHのポテンシャルエネルギー曲面 を量子コンピュータで行う Q#+位相推定編
 
Q#による量子化学計算 : 水素分子の位相推定について
Q#による量子化学計算 : 水素分子の位相推定についてQ#による量子化学計算 : 水素分子の位相推定について
Q#による量子化学計算 : 水素分子の位相推定について
 
量子コンピュータの量子化学計算への応用の現状と展望
量子コンピュータの量子化学計算への応用の現状と展望量子コンピュータの量子化学計算への応用の現状と展望
量子コンピュータの量子化学計算への応用の現状と展望
 
qubitによる波動関数の虚時間発展のシミュレーション: a review
qubitによる波動関数の虚時間発展のシミュレーション: a reviewqubitによる波動関数の虚時間発展のシミュレーション: a review
qubitによる波動関数の虚時間発展のシミュレーション: a review
 
Openfermionを使った分子の計算 part I
Openfermionを使った分子の計算 part IOpenfermionを使った分子の計算 part I
Openfermionを使った分子の計算 part I
 
量子コンピュータで量子化学のfullCIが超高速になる(かも
量子コンピュータで量子化学のfullCIが超高速になる(かも量子コンピュータで量子化学のfullCIが超高速になる(かも
量子コンピュータで量子化学のfullCIが超高速になる(かも
 
20180723 量子コンピュータの量子化学への応用; Bravyi-Kitaev基底の実装
20180723 量子コンピュータの量子化学への応用; Bravyi-Kitaev基底の実装20180723 量子コンピュータの量子化学への応用; Bravyi-Kitaev基底の実装
20180723 量子コンピュータの量子化学への応用; Bravyi-Kitaev基底の実装
 
第11回分子科学 2017/9/17 Pubchemqcプロジェクト
第11回分子科学 2017/9/17 Pubchemqcプロジェクト第11回分子科学 2017/9/17 Pubchemqcプロジェクト
第11回分子科学 2017/9/17 Pubchemqcプロジェクト
 
計算化学実習講座:第二回
 計算化学実習講座:第二回 計算化学実習講座:第二回
計算化学実習講座:第二回
 
計算化学実習講座:第一回
計算化学実習講座:第一回計算化学実習講座:第一回
計算化学実習講座:第一回
 
HOKUSAIのベンチマーク 理研シンポジウム 中田分
HOKUSAIのベンチマーク 理研シンポジウム 中田分HOKUSAIのベンチマーク 理研シンポジウム 中田分
HOKUSAIのベンチマーク 理研シンポジウム 中田分
 
為替取引(FX)でのtickdataの加工とMySQLで管理
為替取引(FX)でのtickdataの加工とMySQLで管理為替取引(FX)でのtickdataの加工とMySQLで管理
為替取引(FX)でのtickdataの加工とMySQLで管理
 
為替のTickdataをDukascopyからダウンロードする
為替のTickdataをDukascopyからダウンロードする為替のTickdataをDukascopyからダウンロードする
為替のTickdataをDukascopyからダウンロードする
 
HPCS2015 pythonを用いた量子化学プログラムの開発と応用
HPCS2015 pythonを用いた量子化学プログラムの開発と応用HPCS2015 pythonを用いた量子化学プログラムの開発と応用
HPCS2015 pythonを用いた量子化学プログラムの開発と応用
 
HPCS2015 大規模量子化学計算プログラムSMASHの開発と公開(石村)
HPCS2015 大規模量子化学計算プログラムSMASHの開発と公開(石村)HPCS2015 大規模量子化学計算プログラムSMASHの開発と公開(石村)
HPCS2015 大規模量子化学計算プログラムSMASHの開発と公開(石村)
 
3Dプリンタ導入記 タンパク質の模型をプリントする
3Dプリンタ導入記 タンパク質の模型をプリントする3Dプリンタ導入記 タンパク質の模型をプリントする
3Dプリンタ導入記 タンパク質の模型をプリントする
 
立教大学化学実験3 SMILESを中心とした高度な分子モデリング 2014/7/1
立教大学化学実験3 SMILESを中心とした高度な分子モデリング 2014/7/1 立教大学化学実験3 SMILESを中心とした高度な分子モデリング 2014/7/1
立教大学化学実験3 SMILESを中心とした高度な分子モデリング 2014/7/1
 
The PubchemQC project
The PubchemQC projectThe PubchemQC project
The PubchemQC project
 

Recently uploaded

How to Give Better Lectures: Some Tips for Doctors
How to Give Better Lectures: Some Tips for DoctorsHow to Give Better Lectures: Some Tips for Doctors
How to Give Better Lectures: Some Tips for Doctors
LanceCatedral
 
Surat @ℂall @Girls ꧁❤8527049040❤꧂@ℂall @Girls Service Vip Top Model Safe
Surat @ℂall @Girls ꧁❤8527049040❤꧂@ℂall @Girls Service Vip Top Model SafeSurat @ℂall @Girls ꧁❤8527049040❤꧂@ℂall @Girls Service Vip Top Model Safe
Surat @ℂall @Girls ꧁❤8527049040❤꧂@ℂall @Girls Service Vip Top Model Safe
Savita Shen $i11
 
Hemodialysis: Chapter 3, Dialysis Water Unit - Dr.Gawad
Hemodialysis: Chapter 3, Dialysis Water Unit - Dr.GawadHemodialysis: Chapter 3, Dialysis Water Unit - Dr.Gawad
Hemodialysis: Chapter 3, Dialysis Water Unit - Dr.Gawad
NephroTube - Dr.Gawad
 
Charaka Samhita Sutra sthana Chapter 15 Upakalpaniyaadhyaya
Charaka Samhita Sutra sthana Chapter 15 UpakalpaniyaadhyayaCharaka Samhita Sutra sthana Chapter 15 Upakalpaniyaadhyaya
Charaka Samhita Sutra sthana Chapter 15 Upakalpaniyaadhyaya
Dr KHALID B.M
 
basicmodesofventilation2022-220313203758.pdf
basicmodesofventilation2022-220313203758.pdfbasicmodesofventilation2022-220313203758.pdf
basicmodesofventilation2022-220313203758.pdf
aljamhori teaching hospital
 
24 Upakrama.pptx class ppt useful in all
24 Upakrama.pptx class ppt useful in all24 Upakrama.pptx class ppt useful in all
24 Upakrama.pptx class ppt useful in all
DrSathishMS1
 
Ocular injury ppt Upendra pal optometrist upums saifai etawah
Ocular injury  ppt  Upendra pal  optometrist upums saifai etawahOcular injury  ppt  Upendra pal  optometrist upums saifai etawah
Ocular injury ppt Upendra pal optometrist upums saifai etawah
pal078100
 
NVBDCP.pptx Nation vector borne disease control program
NVBDCP.pptx Nation vector borne disease control programNVBDCP.pptx Nation vector borne disease control program
NVBDCP.pptx Nation vector borne disease control program
Sapna Thakur
 
Flu Vaccine Alert in Bangalore Karnataka
Flu Vaccine Alert in Bangalore KarnatakaFlu Vaccine Alert in Bangalore Karnataka
Flu Vaccine Alert in Bangalore Karnataka
addon Scans
 
Couples presenting to the infertility clinic- Do they really have infertility...
Couples presenting to the infertility clinic- Do they really have infertility...Couples presenting to the infertility clinic- Do they really have infertility...
Couples presenting to the infertility clinic- Do they really have infertility...
Sujoy Dasgupta
 
263778731218 Abortion Clinic /Pills In Harare ,
263778731218 Abortion Clinic /Pills In Harare ,263778731218 Abortion Clinic /Pills In Harare ,
263778731218 Abortion Clinic /Pills In Harare ,
sisternakatoto
 
Prix Galien International 2024 Forum Program
Prix Galien International 2024 Forum ProgramPrix Galien International 2024 Forum Program
Prix Galien International 2024 Forum Program
Levi Shapiro
 
Pulmonary Thromboembolism - etilogy, types, medical- Surgical and nursing man...
Pulmonary Thromboembolism - etilogy, types, medical- Surgical and nursing man...Pulmonary Thromboembolism - etilogy, types, medical- Surgical and nursing man...
Pulmonary Thromboembolism - etilogy, types, medical- Surgical and nursing man...
VarunMahajani
 
KDIGO 2024 guidelines for diabetologists
KDIGO 2024 guidelines for diabetologistsKDIGO 2024 guidelines for diabetologists
KDIGO 2024 guidelines for diabetologists
د.محمود نجيب
 
heat stroke and heat exhaustion in children
heat stroke and heat exhaustion in childrenheat stroke and heat exhaustion in children
heat stroke and heat exhaustion in children
SumeraAhmad5
 
Ophthalmology Clinical Tests for OSCE exam
Ophthalmology Clinical Tests for OSCE examOphthalmology Clinical Tests for OSCE exam
Ophthalmology Clinical Tests for OSCE exam
KafrELShiekh University
 
BENIGN PROSTATIC HYPERPLASIA.BPH. BPHpdf
BENIGN PROSTATIC HYPERPLASIA.BPH. BPHpdfBENIGN PROSTATIC HYPERPLASIA.BPH. BPHpdf
BENIGN PROSTATIC HYPERPLASIA.BPH. BPHpdf
DR SETH JOTHAM
 
Alcohol_Dr. Jeenal Mistry MD Pharmacology.pdf
Alcohol_Dr. Jeenal Mistry MD Pharmacology.pdfAlcohol_Dr. Jeenal Mistry MD Pharmacology.pdf
Alcohol_Dr. Jeenal Mistry MD Pharmacology.pdf
Dr Jeenal Mistry
 
ARTIFICIAL INTELLIGENCE IN HEALTHCARE.pdf
ARTIFICIAL INTELLIGENCE IN  HEALTHCARE.pdfARTIFICIAL INTELLIGENCE IN  HEALTHCARE.pdf
ARTIFICIAL INTELLIGENCE IN HEALTHCARE.pdf
Anujkumaranit
 
Physiology of Chemical Sensation of smell.pdf
Physiology of Chemical Sensation of smell.pdfPhysiology of Chemical Sensation of smell.pdf
Physiology of Chemical Sensation of smell.pdf
MedicoseAcademics
 

Recently uploaded (20)

How to Give Better Lectures: Some Tips for Doctors
How to Give Better Lectures: Some Tips for DoctorsHow to Give Better Lectures: Some Tips for Doctors
How to Give Better Lectures: Some Tips for Doctors
 
Surat @ℂall @Girls ꧁❤8527049040❤꧂@ℂall @Girls Service Vip Top Model Safe
Surat @ℂall @Girls ꧁❤8527049040❤꧂@ℂall @Girls Service Vip Top Model SafeSurat @ℂall @Girls ꧁❤8527049040❤꧂@ℂall @Girls Service Vip Top Model Safe
Surat @ℂall @Girls ꧁❤8527049040❤꧂@ℂall @Girls Service Vip Top Model Safe
 
Hemodialysis: Chapter 3, Dialysis Water Unit - Dr.Gawad
Hemodialysis: Chapter 3, Dialysis Water Unit - Dr.GawadHemodialysis: Chapter 3, Dialysis Water Unit - Dr.Gawad
Hemodialysis: Chapter 3, Dialysis Water Unit - Dr.Gawad
 
Charaka Samhita Sutra sthana Chapter 15 Upakalpaniyaadhyaya
Charaka Samhita Sutra sthana Chapter 15 UpakalpaniyaadhyayaCharaka Samhita Sutra sthana Chapter 15 Upakalpaniyaadhyaya
Charaka Samhita Sutra sthana Chapter 15 Upakalpaniyaadhyaya
 
basicmodesofventilation2022-220313203758.pdf
basicmodesofventilation2022-220313203758.pdfbasicmodesofventilation2022-220313203758.pdf
basicmodesofventilation2022-220313203758.pdf
 
24 Upakrama.pptx class ppt useful in all
24 Upakrama.pptx class ppt useful in all24 Upakrama.pptx class ppt useful in all
24 Upakrama.pptx class ppt useful in all
 
Ocular injury ppt Upendra pal optometrist upums saifai etawah
Ocular injury  ppt  Upendra pal  optometrist upums saifai etawahOcular injury  ppt  Upendra pal  optometrist upums saifai etawah
Ocular injury ppt Upendra pal optometrist upums saifai etawah
 
NVBDCP.pptx Nation vector borne disease control program
NVBDCP.pptx Nation vector borne disease control programNVBDCP.pptx Nation vector borne disease control program
NVBDCP.pptx Nation vector borne disease control program
 
Flu Vaccine Alert in Bangalore Karnataka
Flu Vaccine Alert in Bangalore KarnatakaFlu Vaccine Alert in Bangalore Karnataka
Flu Vaccine Alert in Bangalore Karnataka
 
Couples presenting to the infertility clinic- Do they really have infertility...
Couples presenting to the infertility clinic- Do they really have infertility...Couples presenting to the infertility clinic- Do they really have infertility...
Couples presenting to the infertility clinic- Do they really have infertility...
 
263778731218 Abortion Clinic /Pills In Harare ,
263778731218 Abortion Clinic /Pills In Harare ,263778731218 Abortion Clinic /Pills In Harare ,
263778731218 Abortion Clinic /Pills In Harare ,
 
Prix Galien International 2024 Forum Program
Prix Galien International 2024 Forum ProgramPrix Galien International 2024 Forum Program
Prix Galien International 2024 Forum Program
 
Pulmonary Thromboembolism - etilogy, types, medical- Surgical and nursing man...
Pulmonary Thromboembolism - etilogy, types, medical- Surgical and nursing man...Pulmonary Thromboembolism - etilogy, types, medical- Surgical and nursing man...
Pulmonary Thromboembolism - etilogy, types, medical- Surgical and nursing man...
 
KDIGO 2024 guidelines for diabetologists
KDIGO 2024 guidelines for diabetologistsKDIGO 2024 guidelines for diabetologists
KDIGO 2024 guidelines for diabetologists
 
heat stroke and heat exhaustion in children
heat stroke and heat exhaustion in childrenheat stroke and heat exhaustion in children
heat stroke and heat exhaustion in children
 
Ophthalmology Clinical Tests for OSCE exam
Ophthalmology Clinical Tests for OSCE examOphthalmology Clinical Tests for OSCE exam
Ophthalmology Clinical Tests for OSCE exam
 
BENIGN PROSTATIC HYPERPLASIA.BPH. BPHpdf
BENIGN PROSTATIC HYPERPLASIA.BPH. BPHpdfBENIGN PROSTATIC HYPERPLASIA.BPH. BPHpdf
BENIGN PROSTATIC HYPERPLASIA.BPH. BPHpdf
 
Alcohol_Dr. Jeenal Mistry MD Pharmacology.pdf
Alcohol_Dr. Jeenal Mistry MD Pharmacology.pdfAlcohol_Dr. Jeenal Mistry MD Pharmacology.pdf
Alcohol_Dr. Jeenal Mistry MD Pharmacology.pdf
 
ARTIFICIAL INTELLIGENCE IN HEALTHCARE.pdf
ARTIFICIAL INTELLIGENCE IN  HEALTHCARE.pdfARTIFICIAL INTELLIGENCE IN  HEALTHCARE.pdf
ARTIFICIAL INTELLIGENCE IN HEALTHCARE.pdf
 
Physiology of Chemical Sensation of smell.pdf
Physiology of Chemical Sensation of smell.pdfPhysiology of Chemical Sensation of smell.pdf
Physiology of Chemical Sensation of smell.pdf
 

The PubChemQC Project

  • 1. The PubChemQC Project A big data construction by first-principles calculations of molecules 中田真秀(NAKATA Maho) ACCC RIKEN maho@riken.jp 2014/12/3 10:35-11:05 JST CREST International Symposium on Post Petescale System Software
  • 2. Background • Atoms and molecules are all composed of matter. • The dream of theoretical chemist: do chemistry without experiment! • On computers  • We treat big data in chemistry! – Chemical space is really huge! • The number of candidates for drugs 1060 http://onlinelibrary.wiley.com/doi/10.1002/wcms.1104/ abstract) • Cf. Exa: 1018
  • 3. Current status of computational chemistry • Relatively good agreements with experiments. • Can explain nature in many cases. – Many good quantum chemistry programs are available! – “DFT B3LYP 6-31G*” calculations rule! • We want to lead chemistry – We only explain what happened.
  • 4. Difference between experiment and calculation/theory • Finding interesting phenomena or problem – How we convert from CO2 to O2? N2+H2 to NH3? – How to synthesize a compound? • Design a key chemical reaction. • Calculations • Experiments – Analyze • Analysis of results • Propose new experiments Only One Difference
  • 5. Difference between experiment and calculation/theory • No difference as science • Most important thing is curiosity! New insights from big data and my sensitivity! Unfortunately, not so many easy-to-use big data for chemistry
  • 6. Googling molecule + Give you recommended molecules!
  • 7. What are needed for Googling molecule? 1. Types, kinds, variety of molecules – # of molecules are infinity; but cover important ones 2. Required properties of molecules – Molecular structure, energy, UV excitation energy, dipole moment 3. Getting properties of molecules by calculation? – Accuracy of calculation, and computer resources… 4. Coding or Encoding molecule – IUPAC nomenclature is not suitable – Do not think about graph theory
  • 8. Databases for lists of molecules • PubChem: 50,000,000 molecules listed, made by NIH, public domain, no curating (imported from catalogs, etc), can obtain via ftp. • ChemSpider : 28,000,000 entries, better curating, no ftp. Restricted for redistribution, download • Web-GDB13 : 900,000,000 entries, just generated by combinatorics. No • Zinc, CheMBL, DrugBank … • CAS : 70,000,000 molecules, proprietary • Nikkaji: 6,000,000, proprietary We use for source of molecules
  • 10. Ex. A molecule listed in PubChem
  • 11. Database for molecular properties by experiments • We must do some experiments for obtaining molecular properties. – No free comprehensive database is known so far. – Pharmaceutical companies do O(1,000,000) experiments for high throughput screening. • Experiments cost huge! – Time consuming, large facilities, costs, hazardous We do not do experiments!
  • 12. Database for molecular properties by computer calculation • Golden Standard method “Density functional theory (B3LYP functional) + 6-31g(d) basis set” – Accuracy is quite satisfactory (1-10kcal/mol) for biological systems, organic chemistry. – Good implementations are available. – Costs less (fast, just super computer, no hazardous) – Time for calculations becomes less • Intel Core i7 (esp. SandyBridge) is very fast. • Still we need huge resources, though. We calculate by computer instead!
  • 13. What is a molecule? No rigorous definition for a molecule 3D coordinates Hard to understand but regours Easy to understand But many coner cases Propionaldehyde wavefunction Common name IUPAC nomencleature Structure Wikipediaより
  • 14. What is a molecule? • No rigorous definition for “what is a molecule” • nomenclature – 3D coordinates for nucleus – Structural formula – IUPAC nomenclature – Higher abstraction or less abstraction? • Better molecular encoding method? – Easy to understand for human – Easy to understand for computer as well – Can describe most cases, and less corner cases. – Compromise between dream and reality
  • 15. Encoding molecule : SMILES Encoding molecule IUPAC nomenclature tert-butyl N-[(2S,3S,5S)-5-[[4-[(1-benzyltetrazol-5-yl) methoxy]phenyl]methyl]-3-hydroxy-6-[[(1S,2R)- 2-hydroxy-2,3-dihydro-1H-inden-1-yl]amino]- 6-oxo-1-phenylhexan-2-yl]carbamate We can encode molecule • SMILES CN(C)CCOC12CCC(C3C1CCCC3)C4=CC=CC=C24 • InChI Made by IUPAC InChI=1S/C20H29NO/c1-21(2)13-14-22-20-12-11 -15(16-7-3-5-9-18(16)20)17-8-4-6-10-19(17)20/ h3,5,7,9,15,17,19H,4,6,8,10-14H2,1-2H3 … SMILES is a good encoding method for molecules
  • 16. What is SMILES? • Simplified Molecular Input Line Entry System – A linear representation of molecule using ASCII. – Conformation is also encoded – Human readable, and also machine readable. – Almost one-to-one mapping between a molecule and SMILES via universal SMILES • David Weininger at USEPA Mid-Continent Ecology Division Laboratory invented SMILES • InChI by IUPAC – International Chemical Identifier : open standard (non proprietary) – NM O’Boyle invented “Universal SMILES” via InChI
  • 17. Example by SMILES http://en.wikipedia.org/wiki/SMILES 分子構造SMILES Nitrogen molecule N≡N N#N copper sulfate Cu2+ SO42- [Cu+2].[O-]S(=O)(=O)[O-] oenanthotoxin CCC[C@@H](O)CCC=CC=C C#CC#CC=CCO Vitamin B1 OCCc1c(C)[n+](=cs1)Cc2cnc(C )nc(N)2 Aflatoxin B1 O1C=C[C@H]([C@H]1O2)c3c 2cc(OC)c4c3OC(=O)C5=C4CC C(=O)5
  • 18. Some corner cases Two different SMILES for Ferrocene • C12C3C4C5C1[Fe]23451234C5C1C2C3C45 • [CH-]1C=CC=C1.[CH-]1C=CC=C1.[Fe+2]
  • 19. Now its my turn
  • 20. Construction of ab initio chemical database • Molecular information is from PubChem • Properties are calculated from the first principle using computer – Many program packages are available – DFT (B3LYP) – 6-31G(d) basis set and geometry optimization – Excited states calculation by TD-DFT 6-31G+(d) – Best for organic molecules or bio molecules • Molecular encoding : SMILES / InChI • Huge computer resources • Dream come true – Google like search engine for chemistry
  • 21. The PubChemQC Project • http://pubchemqc.riken.jp/ • A open database for molecules – Public domain • Ab initio (The first principle) calculation of molecular properties of PubChem • 2014/1/15: 13,000 molecules • 2014/7/29 : 155,792 molecules • 2014/10/30 : 906,798 molecules • 2014/12/3 : 1,137,286 molecules
  • 22. The PubChemQC project http://pubchemqc.riken.jp/ WIP: no search engine, just data
  • 25. Related works • Related works – NIST Web Book • http://webbook.nist.gov/chemistry/ • Small numbers of molecules. Comparing many methods – Harvard Clean Energy Project • http://cleanenergy.molecularspace.org/ • 25,000,000 (?), molecules for photo devices made by combinatrics – Sugimoto et al :2013CBI symposium poster • Almost same as our database, currently not open to the public(now??)
  • 26. How we do? • Generate initial 3D conformation by OpenBABEL – SDF contains 3D conformation but we don’t use. – OpenBABEL –h (add hydrogen) --gen3d (generation of 3d coordinate) • Ab initio calculation by GAMESS+firefly – Using Gaussian can lead to a political problem(?) – PM3 optimization – Hartree-Fock/STO-6G geometry optimization – Firefly+GAMESS geometry optimization in B3LYP/6-31G* – Ten excitation energies by TDDFT/6-31G+* (no geom optimization)
  • 27. How we do? • Heavily using OpenBABEL • Extraction Molecular information – Sort by molecular weight of PubChem compouds – OpenBABEL • Encoded by SMILES – Isomeric smiles: 3D conformation retained – OC[C@@H](O1)[C@@H](O)[C@H](O)[C@@H](O)[C@ @H](O)1 – CCC[C@@H](O)CCC=CC=CC#CC#CC=CCO – CC(=O)OCCC(/C)=CC[C@H](C(C)=C)CCC=C
  • 28. Our way to pubchem Compound to quantum chemistry calculation aflatoxin O1C=C[C@H]([C@H]1O2)c3c2cc(OC)c4c3OC(=O)C5=C4CCC(=O)5 Ab initio calculation by OpenBABEL
  • 29. Final results will be • Uploaded to http://pubchemqc.riken.jp/ • Currently we upload – input file (ground / excited state) – Output file (ground / excited state) – Final geometry in Mol file
  • 30. Scaling of computation • Embarrassingly parallel for each molecule • Very roughly speaking, required time for calculation scales like N^4 – N : molecular weight • Problems are very hard (complexity theory) – Hartree-Fock calculation – DFT (b3lyp) calculation – geometry optimization • Practically many molecules can be solved efficiently
  • 31. Computer Resources • RICC : Intel Xeon 5570 Westmere, 2.93GHz 8 cores/node) x 1000 – 1000-10000 molecules/day (MW 160) – Heavily depend on conditions of other users – Time limit: 8 hours • Quest : Intel Core2 duo (1.6GHz/node) x 700 – 3000-8000 molecules / day (MW 160) – 100-1000 molecules / day (MW 200-300) – Time limit: 20 hours • Some compounds fail to calculate are ignored for this time.
  • 32. Computer Resources • Storage – Approx. 500GB for 1,000,000 molecules (xz compressed) – Approx. 20 TB for 40,000,000 molecules (xz compressed)
  • 33. Molecular weight and Lipinski Rule • Lipinski’s five rule (Pfizer's rule of five): rule of thumb for drug discovery • No more than 5 hydrogen bond donors • Not more than 10 hydrogen bond acceptors • A molecular mass less than 500 daltons • An octanol-water partition coefficient log P not greater than 5 • Molecular weight should be smaller than 500 is very good for computational chemistry – For routine calculations without experimental data other than molecular formula – If larger than 500, secondary or higher structure becomes important. E.g., protein
  • 34. Molecular Weight distribution at PubChem Lipinski limit MW=500 We are still here 30,000,000 molecules (excluding mixtures)
  • 35. How long it will take to finish? • For drug design, we need to calculate all molecules of MW < 500 • Total 30,000,000 molecules – This number may increase in the future • Current (2014/12/4) 1,100,000 molecules – Only 3% • 10,000 molecules/day -> 8.2years
  • 36. How long it will take to finish? • 10+ years? No, maybe far less. • 25 years ago (1990) computers are so slow – Even ab initio calculations are very difficult on 486DX@25MHz or 68000@10MHz
  • 37. Outlook, prospect, hope… • Far better in silico screening – Less or no experiment is necessary • Even more faster calculation using machine learning – 10,000 molecules / second ? – Using our data as learning set. – Not difficult for bio or organic molecules – Far better initial guess • Database for chemical reaction – Precise calculation is required – GRRM method + machine learning (?) • Geometry optimization for Protein (PDB) – Only X ray crystal structures are available http://pubchemqc.riken.jp/
  • 38. Difficulties in this project • Parameters needed for calculations varies by molecules • Properties can be different by initial guess • Computer Resources – Raspberry Pi? NVIDIA Jetson? Bonic? • Molecular encoding never ends – SMILES or InChI is not complete – Some corner cases may be chemically interesting.