Machine Learning in computational materials science: an overview, a primer, and a rant, Igor Mosyagin

Pôle Systematic Paris-Region
Pôle Systematic Paris-RegionPôle Systematic Paris-Region
Machine learning in computational materials
science
overview and personal experience
Igor Mosyagin
June 12, 2017
Few disclaimers
This talk is about computational materials science
There might be scientific fields where everything’s different
Materials science ≈ statistical physics
This talk is based on my limited experience
I also express my own opinion which may not coincide with my
current employer’s opinion.
Machine Learning in computational materials science: an overview, a primer, and a rant, Igor Mosyagin
What is «more» in this context?
In chemistry and physics, the
Avogadro constant is the number
of constituent particles, usually
atoms or molecules, that are
contained in the amount of
substance given by one mole.
NA = 6.022 × 1023
mol−1
At «normal conditions»
1 mol ∝ 22.4 L
Water bottles at pyparis coffee
breaks are 0.5 L.
Example: Density Functional Theory
A. Mattsson et al.; doi:10.1088/0965-0393/13/1/r01
Periodic table of chemical elements
Periodic table of a theoretical physicist
Some projects even get featured in national magazine
Felix A. Faber et al. doi:10.1103/PhysRevLett.117.135502
Computational costs?
Modern state-of-the-art computations — N atoms in a simulation
cell, N is several hundred. World record — few thousand.
c
a
If one adds temperature (but
stays at quantum level), it
becomes more complicated.
For temperature-involving
simulations N is typically several
hundred.
Scales typically as N3
Time to solution? A month, at least
If everything is fine, it takes a few hours for static (T = 0)
calculation, and few weeks for temperature-related simulations.
Steps in temperature-related simulations
1 Preparation. Select parameters, build simulation cells, select
starting positions etc
bash/awk/sed, gui tools. Perl, fortran, matlab
2 Running simulations in an HPC environment (shared
supercomputer with queue, priorities and quotas)
Fortran. Sometimes an old version of fortran.
3 Processing. Parsing output of calculations, building models on
top, visualization, etc
Every other language and gui tool. Also fortran
Temperature-involved calculations are expensive
Everything that is not related to HPC calculations can be done in
high-level language.
There are some packages that help with those steps. Sometimes
those packages even provide python interfaces to fortran codes
(python-ase, pymatgen). There’s a separate journal (a few) for those
sort of programms.
Human factors
The lack of software craftsmanship skills leads to people
believing that Fortran is the only option.
Lack of exposure? The «next» step after bash scripts and
fortran in data processing is usually matlab.
(Young) researchers are no different from developers:
smart
arrogant
NiH-syndrome
lazy
do complex stuff
It might be hard to convince your supervisor to allow you
spend resources on improving your «programming» skills
What can be done?
Need more exposure!
if you organize a meetup — put a note on the local university
board/fb. PhD students tend to have very similar set of interests
as developers.
if you have friends/acquaintances in academia, bring them to
meetup or ask them if they suffer any computer-related pain.
You might help them save a few weeks of work, and maybe get
a free beer in return
there’s always github physics projects that would love to have
somebody help them with code
Lead by example, if you can
Scientists believe that DS is all about classification, while «real»
science is all about regressions
If you feel bold enough, organize a tutorial
A lot of people use matlab/Rstudio only for convenient layout,
and few know that tools like jupyter/spyder exist
Some authority to use with stubborn people
10.1371/journal.pcbi.1003285 and 10.1371/journal.pone.0067111
A few databases with materials data
Machine Learning in computational materials science: an overview, a primer, and a rant, Igor Mosyagin
1 of 16

More Related Content

What's hot(20)

The MGI and AIThe MGI and AI
The MGI and AI
aimsnist208 views

Similar to Machine Learning in computational materials science: an overview, a primer, and a rant, Igor Mosyagin

MeanMean
MeanMaya Alwayswishyou
121 views428 slides
BookBook
Booknajeeb500
795 views428 slides
Cmu experimental designCmu experimental design
Cmu experimental designray4hz
2.1K views428 slides

Similar to Machine Learning in computational materials science: an overview, a primer, and a rant, Igor Mosyagin(20)

STM Innovations Seminar LondonSTM Innovations Seminar London
STM Innovations Seminar London
Philip Bourne644 views
Creating a formal laboratoryCreating a formal laboratory
Creating a formal laboratory
mpiskel2.4K views
MeanMean
Mean
Maya Alwayswishyou121 views
BookBook
Book
najeeb500795 views
Cmu experimental designCmu experimental design
Cmu experimental design
ray4hz2.1K views
Arundel Partners CaseArundel Partners Case
Arundel Partners Case
Michelle Adams5 views
Hope   sos project 9-10Hope   sos project 9-10
Hope sos project 9-10
Amanda Youngblood524 views
Data structures and algorismsData structures and algorisms
Data structures and algorisms
Ahmed Farag957 views
Data stucturesData stuctures
Data stuctures
shadshaf553 views
basic statisticsbasic statistics
basic statistics
rosedelle2.6K views
Think_Stats.pdfThink_Stats.pdf
Think_Stats.pdf
SukanyaSom12 views
Research Project ManagementResearch Project Management
Research Project Management
KEDGE Business School22.4K views
Maintaining lab note bookMaintaining lab note book
Maintaining lab note book
Anil Pethe5K views
M4D-v0.4.pdfM4D-v0.4.pdf
M4D-v0.4.pdf
RizaKhan236 views
Peer Review and Science2.0Peer Review and Science2.0
Peer Review and Science2.0
Jean-Claude Bradley1.3K views

More from Pôle Systematic Paris-Region(20)

Osis18_Cloud : Pas de commun sans communauté ?Osis18_Cloud : Pas de commun sans communauté ?
Osis18_Cloud : Pas de commun sans communauté ?
Pôle Systematic Paris-Region659 views
Osis18_Cloud : Projet Wolphin Osis18_Cloud : Projet Wolphin
Osis18_Cloud : Projet Wolphin
Pôle Systematic Paris-Region231 views
Osis18_Cloud : Virtualisation efficace d’architectures NUMAOsis18_Cloud : Virtualisation efficace d’architectures NUMA
Osis18_Cloud : Virtualisation efficace d’architectures NUMA
Pôle Systematic Paris-Region202 views
Osis18_Cloud : Software-heritageOsis18_Cloud : Software-heritage
Osis18_Cloud : Software-heritage
Pôle Systematic Paris-Region133 views
PyParis 2017 / Un mooc python, by thierry parmentelatPyParis 2017 / Un mooc python, by thierry parmentelat
PyParis 2017 / Un mooc python, by thierry parmentelat
Pôle Systematic Paris-Region2.5K views

Recently uploaded(20)

Tunable Laser (1).pptxTunable Laser (1).pptx
Tunable Laser (1).pptx
Hajira Mahmood21 views
Web Dev - 1 PPT.pdfWeb Dev - 1 PPT.pdf
Web Dev - 1 PPT.pdf
gdsczhcet49 views
CXL at OCPCXL at OCP
CXL at OCP
CXL Forum203 views

Machine Learning in computational materials science: an overview, a primer, and a rant, Igor Mosyagin

  • 1. Machine learning in computational materials science overview and personal experience Igor Mosyagin June 12, 2017
  • 2. Few disclaimers This talk is about computational materials science There might be scientific fields where everything’s different Materials science ≈ statistical physics This talk is based on my limited experience I also express my own opinion which may not coincide with my current employer’s opinion.
  • 4. What is «more» in this context? In chemistry and physics, the Avogadro constant is the number of constituent particles, usually atoms or molecules, that are contained in the amount of substance given by one mole. NA = 6.022 × 1023 mol−1 At «normal conditions» 1 mol ∝ 22.4 L Water bottles at pyparis coffee breaks are 0.5 L.
  • 5. Example: Density Functional Theory A. Mattsson et al.; doi:10.1088/0965-0393/13/1/r01
  • 6. Periodic table of chemical elements
  • 7. Periodic table of a theoretical physicist
  • 8. Some projects even get featured in national magazine Felix A. Faber et al. doi:10.1103/PhysRevLett.117.135502
  • 9. Computational costs? Modern state-of-the-art computations — N atoms in a simulation cell, N is several hundred. World record — few thousand. c a If one adds temperature (but stays at quantum level), it becomes more complicated. For temperature-involving simulations N is typically several hundred. Scales typically as N3
  • 10. Time to solution? A month, at least If everything is fine, it takes a few hours for static (T = 0) calculation, and few weeks for temperature-related simulations. Steps in temperature-related simulations 1 Preparation. Select parameters, build simulation cells, select starting positions etc bash/awk/sed, gui tools. Perl, fortran, matlab 2 Running simulations in an HPC environment (shared supercomputer with queue, priorities and quotas) Fortran. Sometimes an old version of fortran. 3 Processing. Parsing output of calculations, building models on top, visualization, etc Every other language and gui tool. Also fortran
  • 11. Temperature-involved calculations are expensive Everything that is not related to HPC calculations can be done in high-level language. There are some packages that help with those steps. Sometimes those packages even provide python interfaces to fortran codes (python-ase, pymatgen). There’s a separate journal (a few) for those sort of programms.
  • 12. Human factors The lack of software craftsmanship skills leads to people believing that Fortran is the only option. Lack of exposure? The «next» step after bash scripts and fortran in data processing is usually matlab. (Young) researchers are no different from developers: smart arrogant NiH-syndrome lazy do complex stuff It might be hard to convince your supervisor to allow you spend resources on improving your «programming» skills
  • 13. What can be done? Need more exposure! if you organize a meetup — put a note on the local university board/fb. PhD students tend to have very similar set of interests as developers. if you have friends/acquaintances in academia, bring them to meetup or ask them if they suffer any computer-related pain. You might help them save a few weeks of work, and maybe get a free beer in return there’s always github physics projects that would love to have somebody help them with code Lead by example, if you can Scientists believe that DS is all about classification, while «real» science is all about regressions If you feel bold enough, organize a tutorial A lot of people use matlab/Rstudio only for convenient layout, and few know that tools like jupyter/spyder exist
  • 14. Some authority to use with stubborn people 10.1371/journal.pcbi.1003285 and 10.1371/journal.pone.0067111
  • 15. A few databases with materials data