Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Machine Learning in computational materials science: an overview, a primer, and a rant, Igor Mosyagin


Published on

PyParis 2017

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Machine Learning in computational materials science: an overview, a primer, and a rant, Igor Mosyagin

  1. 1. Machine learning in computational materials science overview and personal experience Igor Mosyagin June 12, 2017
  2. 2. Few disclaimers This talk is about computational materials science There might be scientific fields where everything’s different Materials science ≈ statistical physics This talk is based on my limited experience I also express my own opinion which may not coincide with my current employer’s opinion.
  3. 3. What is «more» in this context? In chemistry and physics, the Avogadro constant is the number of constituent particles, usually atoms or molecules, that are contained in the amount of substance given by one mole. NA = 6.022 × 1023 mol−1 At «normal conditions» 1 mol ∝ 22.4 L Water bottles at pyparis coffee breaks are 0.5 L.
  4. 4. Example: Density Functional Theory A. Mattsson et al.; doi:10.1088/0965-0393/13/1/r01
  5. 5. Periodic table of chemical elements
  6. 6. Periodic table of a theoretical physicist
  7. 7. Some projects even get featured in national magazine Felix A. Faber et al. doi:10.1103/PhysRevLett.117.135502
  8. 8. Computational costs? Modern state-of-the-art computations — N atoms in a simulation cell, N is several hundred. World record — few thousand. c a If one adds temperature (but stays at quantum level), it becomes more complicated. For temperature-involving simulations N is typically several hundred. Scales typically as N3
  9. 9. Time to solution? A month, at least If everything is fine, it takes a few hours for static (T = 0) calculation, and few weeks for temperature-related simulations. Steps in temperature-related simulations 1 Preparation. Select parameters, build simulation cells, select starting positions etc bash/awk/sed, gui tools. Perl, fortran, matlab 2 Running simulations in an HPC environment (shared supercomputer with queue, priorities and quotas) Fortran. Sometimes an old version of fortran. 3 Processing. Parsing output of calculations, building models on top, visualization, etc Every other language and gui tool. Also fortran
  10. 10. Temperature-involved calculations are expensive Everything that is not related to HPC calculations can be done in high-level language. There are some packages that help with those steps. Sometimes those packages even provide python interfaces to fortran codes (python-ase, pymatgen). There’s a separate journal (a few) for those sort of programms.
  11. 11. Human factors The lack of software craftsmanship skills leads to people believing that Fortran is the only option. Lack of exposure? The «next» step after bash scripts and fortran in data processing is usually matlab. (Young) researchers are no different from developers: smart arrogant NiH-syndrome lazy do complex stuff It might be hard to convince your supervisor to allow you spend resources on improving your «programming» skills
  12. 12. What can be done? Need more exposure! if you organize a meetup — put a note on the local university board/fb. PhD students tend to have very similar set of interests as developers. if you have friends/acquaintances in academia, bring them to meetup or ask them if they suffer any computer-related pain. You might help them save a few weeks of work, and maybe get a free beer in return there’s always github physics projects that would love to have somebody help them with code Lead by example, if you can Scientists believe that DS is all about classification, while «real» science is all about regressions If you feel bold enough, organize a tutorial A lot of people use matlab/Rstudio only for convenient layout, and few know that tools like jupyter/spyder exist
  13. 13. Some authority to use with stubborn people 10.1371/journal.pcbi.1003285 and 10.1371/journal.pone.0067111
  14. 14. A few databases with materials data