PubChem QC project. In this project we calculate all molecules in the PubChem Project. Currently 1,100,000 molecules are available at http://pubchemqc.riken.jp/ . Results are in public domain.
Molecular dynamics (MD) is a very useful tool to understand various phenomena in atomistic detail. In MD, we can overcome the size- and time-scale problems by efficient parallelization. In this lecture, I’ll explain various parallelization methods of MD with some examples of GENESIS MD software optimization on Fugaku.
Implementation of linear regression and logistic regression on SparkDalei Li
This presentation was developed for a course project at Technical University of Madrid. The course is massively parallel machine learning supervised by Alberto Mozo and Bruno Ordozgoiti.
쉽게 설명하는 GAN (What is this? Gum? It's GAN.)Hansol Kang
Original GAN 논문 리뷰 및 PyTorch 기반의 구현.
딥러닝 개발환경 및 언어 비교.
[참고]
Goodfellow, Ian, et al. "Generative adversarial nets." Advances in neural information processing systems. 2014.
Wang, Su. "Generative Adversarial Networks (GAN) A Gentle Introduction."
초짜 대학원생의 입장에서 이해하는 Generative Adversarial Networks (https://jaejunyoo.blogspot.com/)
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기 (https://www.slideshare.net/NaverEngineering/1-gangenerative-adversarial-network)
프레임워크 비교(https://deeplearning4j.org/kr/compare-dl4j-torch7-pylearn)
AI 개발에AI 개발에 가장 적합한 5가지 프로그래밍 언어 (http://www.itworld.co.kr/news/109189#csidxf9226c7578dd101b41d03bfedfec05e)
Git는 머꼬? GitHub는 또 머지?(https://www.slideshare.net/ianychoi/git-github-46020592)
svn 능력자를 위한 git 개념 가이드(https://www.slideshare.net/einsub/svn-git-17386752)
Når har man gjort nok arkitekturarbeid og kan begynne å skrive kode? Skal man gjøre som i de gode gamle fossefallsdager og detaljere arkitektur og design komplett før man starter utvikling, eller skal man ikke gjøre noe arkitekturarbeid i forkant av utvikling og heller la arkitekturen dannes gjennom testdrevet utvikling? Satt på spissen er selvfølgelig svaret at ingen av disse ytterkantene er den beste måten å gjøre det på. Lyntalen vil se på hvordan man kan lage "akkurat sånn passe mye arkitektur" og hvordan man vet når man har gjort "sånn passe mye arkitektur".
Comentarios de Piedad Córdoba a columna de VladdoPoder Ciudadano
Lula nunca pasó por una universidad pero creó 15 nuevas universidades públicas… a todas estas, ¿Cuándo se fundó la última universidad pública en Colombia?
Molecular dynamics (MD) is a very useful tool to understand various phenomena in atomistic detail. In MD, we can overcome the size- and time-scale problems by efficient parallelization. In this lecture, I’ll explain various parallelization methods of MD with some examples of GENESIS MD software optimization on Fugaku.
Implementation of linear regression and logistic regression on SparkDalei Li
This presentation was developed for a course project at Technical University of Madrid. The course is massively parallel machine learning supervised by Alberto Mozo and Bruno Ordozgoiti.
쉽게 설명하는 GAN (What is this? Gum? It's GAN.)Hansol Kang
Original GAN 논문 리뷰 및 PyTorch 기반의 구현.
딥러닝 개발환경 및 언어 비교.
[참고]
Goodfellow, Ian, et al. "Generative adversarial nets." Advances in neural information processing systems. 2014.
Wang, Su. "Generative Adversarial Networks (GAN) A Gentle Introduction."
초짜 대학원생의 입장에서 이해하는 Generative Adversarial Networks (https://jaejunyoo.blogspot.com/)
1시간만에 GAN(Generative Adversarial Network) 완전 정복하기 (https://www.slideshare.net/NaverEngineering/1-gangenerative-adversarial-network)
프레임워크 비교(https://deeplearning4j.org/kr/compare-dl4j-torch7-pylearn)
AI 개발에AI 개발에 가장 적합한 5가지 프로그래밍 언어 (http://www.itworld.co.kr/news/109189#csidxf9226c7578dd101b41d03bfedfec05e)
Git는 머꼬? GitHub는 또 머지?(https://www.slideshare.net/ianychoi/git-github-46020592)
svn 능력자를 위한 git 개념 가이드(https://www.slideshare.net/einsub/svn-git-17386752)
Når har man gjort nok arkitekturarbeid og kan begynne å skrive kode? Skal man gjøre som i de gode gamle fossefallsdager og detaljere arkitektur og design komplett før man starter utvikling, eller skal man ikke gjøre noe arkitekturarbeid i forkant av utvikling og heller la arkitekturen dannes gjennom testdrevet utvikling? Satt på spissen er selvfølgelig svaret at ingen av disse ytterkantene er den beste måten å gjøre det på. Lyntalen vil se på hvordan man kan lage "akkurat sånn passe mye arkitektur" og hvordan man vet når man har gjort "sånn passe mye arkitektur".
Comentarios de Piedad Córdoba a columna de VladdoPoder Ciudadano
Lula nunca pasó por una universidad pero creó 15 nuevas universidades públicas… a todas estas, ¿Cuándo se fundó la última universidad pública en Colombia?
Ipsos Consumer Confidence Index April 2013Ipsos UK
Ipsos’ monthly 24-country survey finds just 13% of Britons saying their economy is in “good” shape, little changed from the 12% recorded last month and indeed unchanged on a year ago (also 12%).
A benchmark of substructure searching tools given at the Cambridge Cheminformatics Network Meeting (May 27th). Slides have added annotated to aid description.
RSC hosts a number of platforms providing free access to chemistry related data. The content includes chemical compounds and associated experimental and predicted data, chemical reactions and, increasingly, spectral data. The ChemSpider database primarily contains electronic spectral data generated at the instrument, converted into standard formats such as JCAMP, then uploaded for the community to access. As a publisher RSC holds a rich source of spectral data within our scientific publications and associated electronic supplementary information. We have undertaken a project to Digitally Enable the RSC Archive (DERA) and as part of this project are converting figures of spectral data into standard spectral data formats for storage in our ChemSpider database. This presentation will report on our progress in the project and some of the challenges we have faced to date.
Presentation at the 42nd HPC User Forum 6-8 Sept 2011. Why do commercial customers need to do simulation, why HPC is important. Presents examples in protein-ligand binding, fuel cells, batteries, sensors
This is a short lecture that I gave to school childrenin June 2012, at University College Dublin, Ireland about the amazing "Physics of Drug Discovery." It can be an interesting template to introduce students in the field of statistical physics.
The traditional perception of the publishing process has been that it culminates in a print article. The Royal Society of Chemistry (RSC) has for many years been acutely aware that there is a wealth of information contained in scientific communications that we publish and that its true value can only be unlocked by enabling the discovery of the data within them. This is challenging due to the variety of ways that scientists provide data, textually, graphically, and increasingly in supplementary information. This talk will outline how the RSC has applied innovative approaches, developed both internally and externally, to identifying important chemical data within the literature and provides tools to anyone using chemical data to analyse and improve its quality. Examples will include: Project Prospect, the Experimental Data Checker, our CIF data importer, ChemSpider and our structure validation and standardization service.
This presentation was given by David Sharpe at the ACS Fall Meeting 2012 in Philadelphia
Lecture on Computer-Assisted Structure Elucidation delivered as part of the summer school on metabolomics data analysis in the cloud on Sardinia, 2017. Author and Speaker: Prof. Dr. Christoph Steinbeck, Friedrich-Schiller-University, Jena, Germany.
Exploring Practices in Machine Learning and Machine Discovery for Heterogeneo...Ichigaku Takigawa
Video https://youtu.be/P4QogT8bdqY
ACS Spring 2023 Symposium on AI-Accelerated Scientific Workflow
https://acs.digitellinc.com/acs/sessions/526630/view
ACS SPRING 2023 ———— Crossroads of Chemistry
Indianapolis, IN & Hybrid, March 26-30
https://www.acs.org/meetings/acs-meetings/spring-2023.html
Slide PDF
https://itakigawa.page.link/acs2023spring
Our Paper
Accelerated discovery of multi-elemental reverse water-gas shift catalysts using extrapolative machine learning approach (2022, ChemRxiv)
https://doi.org/10.26434/chemrxiv-2022-695rj
Ichi Takigawa
https://itakigawa.github.io/
QIQB(大阪大学先導的学際研究機構量子情報・量子生命研究部門)セミナー でのスライドを加筆したもの。量子コンピュータを用いた量子化学計算の現在の状況と展望を述べた.
伝統的なゲート式位相推定による方法とvariational eigen solverによるものと2つ。ごく最近虚時間発展法の実装もされており、それは別スライドで概観した。
- Video recording of this lecture in English language: https://youtu.be/lK81BzxMqdo
- Video recording of this lecture in Arabic language: https://youtu.be/Ve4P0COk9OI
- Link to download the book free: https://nephrotube.blogspot.com/p/nephrotube-nephrology-books.html
- Link to NephroTube website: www.NephroTube.com
- Link to NephroTube social media accounts: https://nephrotube.blogspot.com/p/join-nephrotube-on-social-media.html
NVBDCP.pptx Nation vector borne disease control programSapna Thakur
NVBDCP was launched in 2003-2004 . Vector-Borne Disease: Disease that results from an infection transmitted to humans and other animals by blood-feeding arthropods, such as mosquitoes, ticks, and fleas. Examples of vector-borne diseases include Dengue fever, West Nile Virus, Lyme disease, and malaria.
Flu Vaccine Alert in Bangalore Karnatakaaddon Scans
As flu season approaches, health officials in Bangalore, Karnataka, are urging residents to get their flu vaccinations. The seasonal flu, while common, can lead to severe health complications, particularly for vulnerable populations such as young children, the elderly, and those with underlying health conditions.
Dr. Vidisha Kumari, a leading epidemiologist in Bangalore, emphasizes the importance of getting vaccinated. "The flu vaccine is our best defense against the influenza virus. It not only protects individuals but also helps prevent the spread of the virus in our communities," he says.
This year, the flu season is expected to coincide with a potential increase in other respiratory illnesses. The Karnataka Health Department has launched an awareness campaign highlighting the significance of flu vaccinations. They have set up multiple vaccination centers across Bangalore, making it convenient for residents to receive their shots.
To encourage widespread vaccination, the government is also collaborating with local schools, workplaces, and community centers to facilitate vaccination drives. Special attention is being given to ensuring that the vaccine is accessible to all, including marginalized communities who may have limited access to healthcare.
Residents are reminded that the flu vaccine is safe and effective. Common side effects are mild and may include soreness at the injection site, mild fever, or muscle aches. These side effects are generally short-lived and far less severe than the flu itself.
Healthcare providers are also stressing the importance of continuing COVID-19 precautions. Wearing masks, practicing good hand hygiene, and maintaining social distancing are still crucial, especially in crowded places.
Protect yourself and your loved ones by getting vaccinated. Together, we can help keep Bangalore healthy and safe this flu season. For more information on vaccination centers and schedules, residents can visit the Karnataka Health Department’s official website or follow their social media pages.
Stay informed, stay safe, and get your flu shot today!
Couples presenting to the infertility clinic- Do they really have infertility...Sujoy Dasgupta
Dr Sujoy Dasgupta presented the study on "Couples presenting to the infertility clinic- Do they really have infertility? – The unexplored stories of non-consummation" in the 13th Congress of the Asia Pacific Initiative on Reproduction (ASPIRE 2024) at Manila on 24 May, 2024.
263778731218 Abortion Clinic /Pills In Harare ,sisternakatoto
263778731218 Abortion Clinic /Pills In Harare ,ABORTION WOMEN’S CLINIC +27730423979 IN women clinic we believe that every woman should be able to make choices in her pregnancy. Our job is to provide compassionate care, safety,affordable and confidential services. That’s why we have won the trust from all generations of women all over the world. we use non surgical method(Abortion pills) to terminate…Dr.LISA +27730423979women Clinic is committed to providing the highest quality of obstetrical and gynecological care to women of all ages. Our dedicated staff aim to treat each patient and her health concerns with compassion and respect.Our dedicated group ABORTION WOMEN’S CLINIC +27730423979 IN women clinic we believe that every woman should be able to make choices in her pregnancy. Our job is to provide compassionate care, safety,affordable and confidential services. That’s why we have won the trust from all generations of women all over the world. we use non surgical method(Abortion pills) to terminate…Dr.LISA +27730423979women Clinic is committed to providing the highest quality of obstetrical and gynecological care to women of all ages. Our dedicated staff aim to treat each patient and her health concerns with compassion and respect.Our dedicated group of receptionists, nurses, and physicians have worked together as a teamof receptionists, nurses, and physicians have worked together as a team wwww.lisywomensclinic.co.za/
Prix Galien International 2024 Forum ProgramLevi Shapiro
June 20, 2024, Prix Galien International and Jerusalem Ethics Forum in ROME. Detailed agenda including panels:
- ADVANCES IN CARDIOLOGY: A NEW PARADIGM IS COMING
- WOMEN’S HEALTH: FERTILITY PRESERVATION
- WHAT’S NEW IN THE TREATMENT OF INFECTIOUS,
ONCOLOGICAL AND INFLAMMATORY SKIN DISEASES?
- ARTIFICIAL INTELLIGENCE AND ETHICS
- GENE THERAPY
- BEYOND BORDERS: GLOBAL INITIATIVES FOR DEMOCRATIZING LIFE SCIENCE TECHNOLOGIES AND PROMOTING ACCESS TO HEALTHCARE
- ETHICAL CHALLENGES IN LIFE SCIENCES
- Prix Galien International Awards Ceremony
Pulmonary Thromboembolism - etilogy, types, medical- Surgical and nursing man...VarunMahajani
Disruption of blood supply to lung alveoli due to blockage of one or more pulmonary blood vessels is called as Pulmonary thromboembolism. In this presentation we will discuss its causes, types and its management in depth.
The prostate is an exocrine gland of the male mammalian reproductive system
It is a walnut-sized gland that forms part of the male reproductive system and is located in front of the rectum and just below the urinary bladder
Function is to store and secrete a clear, slightly alkaline fluid that constitutes 10-30% of the volume of the seminal fluid that along with the spermatozoa, constitutes semen
A healthy human prostate measures (4cm-vertical, by 3cm-horizontal, 2cm ant-post ).
It surrounds the urethra just below the urinary bladder. It has anterior, median, posterior and two lateral lobes
It’s work is regulated by androgens which are responsible for male sex characteristics
Generalised disease of the prostate due to hormonal derangement which leads to non malignant enlargement of the gland (increase in the number of epithelial cells and stromal tissue)to cause compression of the urethra leading to symptoms (LUTS
Ethanol (CH3CH2OH), or beverage alcohol, is a two-carbon alcohol
that is rapidly distributed in the body and brain. Ethanol alters many
neurochemical systems and has rewarding and addictive properties. It
is the oldest recreational drug and likely contributes to more morbidity,
mortality, and public health costs than all illicit drugs combined. The
5th edition of the Diagnostic and Statistical Manual of Mental Disorders
(DSM-5) integrates alcohol abuse and alcohol dependence into a single
disorder called alcohol use disorder (AUD), with mild, moderate,
and severe subclassifications (American Psychiatric Association, 2013).
In the DSM-5, all types of substance abuse and dependence have been
combined into a single substance use disorder (SUD) on a continuum
from mild to severe. A diagnosis of AUD requires that at least two of
the 11 DSM-5 behaviors be present within a 12-month period (mild
AUD: 2–3 criteria; moderate AUD: 4–5 criteria; severe AUD: 6–11 criteria).
The four main behavioral effects of AUD are impaired control over
drinking, negative social consequences, risky use, and altered physiological
effects (tolerance, withdrawal). This chapter presents an overview
of the prevalence and harmful consequences of AUD in the U.S.,
the systemic nature of the disease, neurocircuitry and stages of AUD,
comorbidities, fetal alcohol spectrum disorders, genetic risk factors, and
pharmacotherapies for AUD.
ARTIFICIAL INTELLIGENCE IN HEALTHCARE.pdfAnujkumaranit
Artificial intelligence (AI) refers to the simulation of human intelligence processes by machines, especially computer systems. It encompasses tasks such as learning, reasoning, problem-solving, perception, and language understanding. AI technologies are revolutionizing various fields, from healthcare to finance, by enabling machines to perform tasks that typically require human intelligence.
Title: Sense of Smell
Presenter: Dr. Faiza, Assistant Professor of Physiology
Qualifications:
MBBS (Best Graduate, AIMC Lahore)
FCPS Physiology
ICMT, CHPE, DHPE (STMU)
MPH (GC University, Faisalabad)
MBA (Virtual University of Pakistan)
Learning Objectives:
Describe the primary categories of smells and the concept of odor blindness.
Explain the structure and location of the olfactory membrane and mucosa, including the types and roles of cells involved in olfaction.
Describe the pathway and mechanisms of olfactory signal transmission from the olfactory receptors to the brain.
Illustrate the biochemical cascade triggered by odorant binding to olfactory receptors, including the role of G-proteins and second messengers in generating an action potential.
Identify different types of olfactory disorders such as anosmia, hyposmia, hyperosmia, and dysosmia, including their potential causes.
Key Topics:
Olfactory Genes:
3% of the human genome accounts for olfactory genes.
400 genes for odorant receptors.
Olfactory Membrane:
Located in the superior part of the nasal cavity.
Medially: Folds downward along the superior septum.
Laterally: Folds over the superior turbinate and upper surface of the middle turbinate.
Total surface area: 5-10 square centimeters.
Olfactory Mucosa:
Olfactory Cells: Bipolar nerve cells derived from the CNS (100 million), with 4-25 olfactory cilia per cell.
Sustentacular Cells: Produce mucus and maintain ionic and molecular environment.
Basal Cells: Replace worn-out olfactory cells with an average lifespan of 1-2 months.
Bowman’s Gland: Secretes mucus.
Stimulation of Olfactory Cells:
Odorant dissolves in mucus and attaches to receptors on olfactory cilia.
Involves a cascade effect through G-proteins and second messengers, leading to depolarization and action potential generation in the olfactory nerve.
Quality of a Good Odorant:
Small (3-20 Carbon atoms), volatile, water-soluble, and lipid-soluble.
Facilitated by odorant-binding proteins in mucus.
Membrane Potential and Action Potential:
Resting membrane potential: -55mV.
Action potential frequency in the olfactory nerve increases with odorant strength.
Adaptation Towards the Sense of Smell:
Rapid adaptation within the first second, with further slow adaptation.
Psychological adaptation greater than receptor adaptation, involving feedback inhibition from the central nervous system.
Primary Sensations of Smell:
Camphoraceous, Musky, Floral, Pepperminty, Ethereal, Pungent, Putrid.
Odor Detection Threshold:
Examples: Hydrogen sulfide (0.0005 ppm), Methyl-mercaptan (0.002 ppm).
Some toxic substances are odorless at lethal concentrations.
Characteristics of Smell:
Odor blindness for single substances due to lack of appropriate receptor protein.
Behavioral and emotional influences of smell.
Transmission of Olfactory Signals:
From olfactory cells to glomeruli in the olfactory bulb, involving lateral inhibition.
Primitive, less old, and new olfactory systems with different path
1. The PubChemQC Project
A big data construction by first-principles
calculations of molecules
中田真秀(NAKATA Maho)
ACCC RIKEN
maho@riken.jp
2014/12/3 10:35-11:05
JST CREST International Symposium on Post
Petescale System Software
2. Background
• Atoms and molecules are all composed of matter.
• The dream of theoretical chemist: do chemistry
without experiment!
• On computers
• We treat big data in chemistry!
– Chemical space is really huge!
• The number of candidates for drugs
1060
http://onlinelibrary.wiley.com/doi/10.1002/wcms.1104/
abstract)
• Cf. Exa: 1018
3. Current status of computational
chemistry
• Relatively good agreements with experiments.
• Can explain nature in many cases.
– Many good quantum chemistry programs are
available!
– “DFT B3LYP 6-31G*” calculations rule!
• We want to lead chemistry
– We only explain what happened.
4. Difference between experiment and
calculation/theory
• Finding interesting phenomena or problem
– How we convert from CO2 to O2? N2+H2 to NH3?
– How to synthesize a compound?
• Design a key chemical reaction.
• Calculations
• Experiments
– Analyze
• Analysis of results
• Propose new experiments
Only One Difference
5. Difference between experiment and
calculation/theory
• No difference as science
• Most important thing is curiosity!
New insights from
big data and
my sensitivity!
Unfortunately, not so many easy-to-use
big data for chemistry
7. What are needed for Googling molecule?
1. Types, kinds, variety of molecules
– # of molecules are infinity; but cover important ones
2. Required properties of molecules
– Molecular structure, energy, UV excitation energy,
dipole moment
3. Getting properties of molecules by calculation?
– Accuracy of calculation, and computer resources…
4. Coding or Encoding molecule
– IUPAC nomenclature is not suitable
– Do not think about graph theory
8. Databases for lists of molecules
• PubChem: 50,000,000 molecules listed, made by NIH,
public domain, no curating (imported from catalogs,
etc), can obtain via ftp.
• ChemSpider : 28,000,000 entries, better curating, no
ftp. Restricted for redistribution, download
• Web-GDB13 : 900,000,000 entries, just generated by
combinatorics. No
• Zinc, CheMBL, DrugBank …
• CAS : 70,000,000 molecules, proprietary
• Nikkaji: 6,000,000, proprietary
We use for source of molecules
11. Database for molecular properties by
experiments
• We must do some experiments for obtaining
molecular properties.
– No free comprehensive database is known so far.
– Pharmaceutical companies do O(1,000,000)
experiments for high throughput screening.
• Experiments cost huge!
– Time consuming, large facilities, costs, hazardous
We do not do experiments!
12. Database for molecular properties by computer
calculation
• Golden Standard method “Density functional
theory (B3LYP functional) + 6-31g(d) basis set”
– Accuracy is quite satisfactory (1-10kcal/mol) for
biological systems, organic chemistry.
– Good implementations are available.
– Costs less (fast, just super computer, no hazardous)
– Time for calculations becomes less
• Intel Core i7 (esp. SandyBridge) is very fast.
• Still we need huge resources, though.
We calculate by computer instead!
13. What is a molecule?
No rigorous definition for a molecule
3D coordinates
Hard to understand
but regours
Easy to understand
But many coner cases
Propionaldehyde
wavefunction
Common name
IUPAC
nomencleature
Structure
Wikipediaより
14. What is a molecule?
• No rigorous definition for “what is a molecule”
• nomenclature
– 3D coordinates for nucleus
– Structural formula
– IUPAC nomenclature
– Higher abstraction or less abstraction?
• Better molecular encoding method?
– Easy to understand for human
– Easy to understand for computer as well
– Can describe most cases, and less corner cases.
– Compromise between dream and reality
15. Encoding molecule : SMILES
Encoding molecule
IUPAC nomenclature
tert-butyl N-[(2S,3S,5S)-5-[[4-[(1-benzyltetrazol-5-yl)
methoxy]phenyl]methyl]-3-hydroxy-6-[[(1S,2R)-
2-hydroxy-2,3-dihydro-1H-inden-1-yl]amino]-
6-oxo-1-phenylhexan-2-yl]carbamate
We can encode molecule
• SMILES
CN(C)CCOC12CCC(C3C1CCCC3)C4=CC=CC=C24
• InChI Made by IUPAC
InChI=1S/C20H29NO/c1-21(2)13-14-22-20-12-11
-15(16-7-3-5-9-18(16)20)17-8-4-6-10-19(17)20/
h3,5,7,9,15,17,19H,4,6,8,10-14H2,1-2H3
…
SMILES is a good encoding method for molecules
16. What is SMILES?
• Simplified Molecular Input Line Entry System
– A linear representation of molecule using ASCII.
– Conformation is also encoded
– Human readable, and also machine readable.
– Almost one-to-one mapping between a molecule and
SMILES via universal SMILES
• David Weininger at USEPA Mid-Continent Ecology Division Laboratory invented SMILES
• InChI by IUPAC
– International Chemical Identifier : open standard (non proprietary)
– NM O’Boyle invented “Universal SMILES” via InChI
20. Construction of ab initio chemical
database
• Molecular information is from PubChem
• Properties are calculated from the first principle using
computer
– Many program packages are available
– DFT (B3LYP)
– 6-31G(d) basis set and geometry optimization
– Excited states calculation by TD-DFT 6-31G+(d)
– Best for organic molecules or bio molecules
• Molecular encoding : SMILES / InChI
• Huge computer resources
• Dream come true
– Google like search engine for chemistry
21. The PubChemQC Project
• http://pubchemqc.riken.jp/
• A open database for molecules
– Public domain
• Ab initio (The first principle) calculation of
molecular properties of PubChem
• 2014/1/15: 13,000 molecules
• 2014/7/29 : 155,792 molecules
• 2014/10/30 : 906,798 molecules
• 2014/12/3 : 1,137,286 molecules
25. Related works
• Related works
– NIST Web Book
• http://webbook.nist.gov/chemistry/
• Small numbers of molecules. Comparing many methods
– Harvard Clean Energy Project
• http://cleanenergy.molecularspace.org/
• 25,000,000 (?), molecules for photo devices made by
combinatrics
– Sugimoto et al :2013CBI symposium poster
• Almost same as our database, currently not open to the
public(now??)
26. How we do?
• Generate initial 3D conformation by OpenBABEL
– SDF contains 3D conformation but we don’t use.
– OpenBABEL –h (add hydrogen) --gen3d (generation of 3d
coordinate)
• Ab initio calculation by GAMESS+firefly
– Using Gaussian can lead to a political problem(?)
– PM3 optimization
– Hartree-Fock/STO-6G geometry optimization
– Firefly+GAMESS geometry optimization in B3LYP/6-31G*
– Ten excitation energies by TDDFT/6-31G+* (no geom
optimization)
27. How we do?
• Heavily using OpenBABEL
• Extraction Molecular information
– Sort by molecular weight of PubChem compouds
– OpenBABEL
• Encoded by SMILES
– Isomeric smiles: 3D conformation retained
– OC[C@@H](O1)[C@@H](O)[C@H](O)[C@@H](O)[C@
@H](O)1
– CCC[C@@H](O)CCC=CC=CC#CC#CC=CCO
– CC(=O)OCCC(/C)=CC[C@H](C(C)=C)CCC=C
28. Our way to pubchem Compound to
quantum chemistry calculation
aflatoxin
O1C=C[C@H]([C@H]1O2)c3c2cc(OC)c4c3OC(=O)C5=C4CCC(=O)5
Ab initio calculation by
OpenBABEL
29. Final results will be
• Uploaded to http://pubchemqc.riken.jp/
• Currently we upload
– input file (ground / excited state)
– Output file (ground / excited state)
– Final geometry in Mol file
30. Scaling of computation
• Embarrassingly parallel for each molecule
• Very roughly speaking, required time for
calculation scales like N^4
– N : molecular weight
• Problems are very hard (complexity theory)
– Hartree-Fock calculation
– DFT (b3lyp) calculation
– geometry optimization
• Practically many molecules can be solved
efficiently
31. Computer Resources
• RICC : Intel Xeon 5570 Westmere, 2.93GHz 8
cores/node) x 1000
– 1000-10000 molecules/day (MW 160)
– Heavily depend on conditions of other users
– Time limit: 8 hours
• Quest : Intel Core2 duo (1.6GHz/node) x 700
– 3000-8000 molecules / day (MW 160)
– 100-1000 molecules / day (MW 200-300)
– Time limit: 20 hours
• Some compounds fail to calculate are ignored for
this time.
33. Molecular weight and Lipinski Rule
• Lipinski’s five rule (Pfizer's rule of five): rule of
thumb for drug discovery
• No more than 5 hydrogen bond donors
• Not more than 10 hydrogen bond acceptors
• A molecular mass less than 500 daltons
• An octanol-water partition coefficient log P not greater than 5
• Molecular weight should be smaller than 500 is
very good for computational chemistry
– For routine calculations without experimental data
other than molecular formula
– If larger than 500, secondary or higher structure
becomes important. E.g., protein
34. Molecular Weight distribution at
PubChem
Lipinski limit MW=500
We are still here
30,000,000 molecules
(excluding mixtures)
35. How long it will take to finish?
• For drug design, we need to calculate all
molecules of MW < 500
• Total 30,000,000 molecules
– This number may increase in the future
• Current (2014/12/4) 1,100,000 molecules
– Only 3%
• 10,000 molecules/day -> 8.2years
36. How long it will take to finish?
• 10+ years? No, maybe far less.
• 25 years ago (1990) computers are so slow
– Even ab initio calculations are very difficult on
486DX@25MHz or
68000@10MHz
37. Outlook, prospect, hope…
• Far better in silico screening
– Less or no experiment is necessary
• Even more faster calculation using machine learning
– 10,000 molecules / second ?
– Using our data as learning set.
– Not difficult for bio or organic molecules
– Far better initial guess
• Database for chemical reaction
– Precise calculation is required
– GRRM method + machine learning (?)
• Geometry optimization for Protein (PDB)
– Only X ray crystal structures are available
http://pubchemqc.riken.jp/
38. Difficulties in this project
• Parameters needed for calculations varies by
molecules
• Properties can be different by initial guess
• Computer Resources
– Raspberry Pi? NVIDIA Jetson? Bonic?
• Molecular encoding never ends
– SMILES or InChI is not complete
– Some corner cases may be chemically interesting.