Report

Jonathan SkeltonFollow

Feb. 2, 2016•0 likes## 9 likes

•9,825 views## views

Be the first to like this

Show More

Total views

0

On Slideshare

0

From embeds

0

Number of embeds

0

Feb. 2, 2016•0 likes## 9 likes

•9,825 views## views

Be the first to like this

Show More

Total views

0

On Slideshare

0

From embeds

0

Number of embeds

0

Download to read offline

Report

Science

Some "accumulated wisdom" from several years of using the Vienna ab initio Simulation Package (VASP) code for computational modelling. Includes tips on convergence and parallelisation.

Jonathan SkeltonFollow

Wannier90: Band Structures, Tips and TricksJonathan Skelton

Phonons & Phonopy: Pro Tips (2015)Jonathan Skelton

VASP And Wannier90: A Quick TutorialJonathan Skelton

Phonons & Phonopy: Pro Tips (2014)Jonathan Skelton

Wannier tutorialsairajdream

vasp-gpu on Balena: Usage and Some BenchmarksJonathan Skelton

- VASP: Some Accumulated Wisdom J. M. Skelton WMD Group Meeting 21st September 2015
- WMD Group Meeting, September 2015 | Slide 2 Convergence: Parameters • Four key technical parameters in a VASP calculation: o Basis set: ENCUT and PREC (or, alternatively, NGX, NGY, NGZ) o k-point sampling: KPOINTS file and SIGMA o [For certain types of pseudopotential.] Augmentation grid: ENAUG and PREC (or, alternatively, NGXF, NGYF, NGZF) o Which space the projection operators are applied in (LREAL)
- WMD Group Meeting, September 2015 | Slide 3 Convergence: Augmentation grid • A second, finer mesh is used to represent the charge density near the ion cores: controlled by ENAUG (or PREC + EAUG in the POTCAR files), which determines NG*F
- WMD Group Meeting, September 2015 | Slide 4 Convergence: ZnS revisited • For calculations on ZnS with TPSS, ENAUG needs to be increased from the default (but ENCUT = 550 eV is fine) - equivalent to increasing NG*F [but without also increasing NG* as in the QHA-ExC paper, which evidently unnecessary (!)]
- WMD Group Meeting, September 2015 | Slide 5 Convergence: ZnS revisited • For calculations on ZnS with TPSS, ENAUG needs to be increased from the default (but ENCUT = 550 eV is fine) - equivalent to increasing NG*F, but without also increasing NG*, which is wasteful ENCUT / eV ENAUG / eV NG* NG*F Noise? t / min 550 575.892 120 160 û - 650 575.892 128 160 û - 750 575.892 140 160 û - 850 575.892 150 160 û - 550 675.892 120 180 ü 116 550 775.892 120 192 ü 108 550 875.892 120 200 ü 113
- WMD Group Meeting, September 2015 | Slide 6 The VASP SCF cycle • The SCF cycle proceeds in two phases: o The plane-wave coefficients are initialised randomly and “pre-optimised” within a fixed potential given by the superposition of atomic densities (INIWAV, NELMDL) o The wavefunctions and density are then optimised self-consistently to convergence (EDIFF, NELMIN, NELM) o If an initial charge density exists (e.g. from a previous SCF or converged CHGCAR/WAVECAR), the first step can be skipped (ISTART, ICHARG) • To accelerate convergence, the output density from a step N is not fed directly into the next step N+1, but is mixed with the input density (IMIX, INIMIX, MIXPRE, MAXMIX, AMIX, AMIN, AMIX_MAG, BMIX, BMIX_MAG, WC) • For the mathematically-minded: http://th.fhi-berlin.mpg.de/th/Meetings/DFT-workshop- Berlin2011/presentations/2011-07-14_Marsman_Martijn.pdf
- WMD Group Meeting, September 2015 | Slide 7 The VASP SCF cycle N E dE d eps ncg rms rms(c) DAV: 1 0.425437171796E+04 0.42544E+04 -0.38613E+05 920 0.178E+03 DAV: 2 -0.114846409831E+04 -0.54028E+04 -0.51653E+04 1130 0.323E+02 DAV: 3 -0.169662738043E+04 -0.54816E+03 -0.53994E+03 1130 0.100E+02 DAV: 4 -0.171494085624E+04 -0.18313E+02 -0.18206E+02 1160 0.198E+01 DAV: 5 -0.171553585547E+04 -0.59500E+00 -0.59387E+00 1220 0.331E+00 0.706E+01 RMM: 6 -0.159733114612E+04 0.11820E+03 -0.21124E+02 920 0.147E+01 0.352E+01 RMM: 7 -0.157358217358E+04 0.23749E+02 -0.82778E+01 920 0.937E+00 0.173E+01 RMM: 8 -0.157195752202E+04 0.16247E+01 -0.10028E+01 922 0.344E+00 0.736E+00 RMM: 9 -0.157170732229E+04 0.25020E+00 -0.24051E+00 920 0.173E+00 0.186E+00 RMM: 10 -0.157170709721E+04 0.22508E-03 -0.17654E-01 932 0.561E-01 0.965E-01 RMM: 11 -0.157173130475E+04 -0.24208E-01 -0.10240E-01 920 0.332E-01 0.466E-01 RMM: 12 -0.157174953342E+04 -0.18229E-01 -0.23004E-02 920 0.198E-01 0.213E-01 RMM: 13 -0.157175624413E+04 -0.67107E-02 -0.12470E-02 920 0.134E-01 0.938E-02 RMM: 14 -0.157175705572E+04 -0.81159E-03 -0.49641E-03 922 0.781E-02 0.577E-02 RMM: 15 -0.157175711576E+04 -0.60039E-04 -0.62130E-04 922 0.302E-02 0.211E-02 RMM: 16 -0.157175714692E+04 -0.31162E-04 -0.18825E-04 932 0.152E-02 0.146E-02 RMM: 17 -0.157175715237E+04 -0.54516E-05 -0.37827E-05 935 0.701E-03 0.564E-03 RMM: 18 -0.157175715526E+04 -0.28845E-05 -0.88070E-06 824 0.340E-03 0.361E-03 RMM: 19 -0.157175715551E+04 -0.24851E-06 -0.27408E-06 657 0.209E-03 1 F= -.15717572E+04 E0= -.15717572E+04 d E =-.291254-147 Between NELMIN and NELM steps in total NELMDL steps in a fixed potential Minimisation algorithm Total free energy Change in total energy and eigenvalues Number of evaluations of 𝐻"#𝛹⟩ Difference in input and output density; oscillations probably indicate convergence problems Total free and zero-broadening (𝜎 → 0) energy
- WMD Group Meeting, September 2015 | Slide 8 The ALGO tag • ALGO is the “recommended” tag for selecting the electronic-minimisation algorithm • Most of the algorithms have “subswitches”, which can be selected using IALGO • I tend to use one of four ALGOs: • RMM-DIIS (ALGO = VeryFast): fastest per SCF step, best parallelised, and converges quickly close to a minimum, but can struggle with difficult systems • Blocked Davidson (ALGO = Normal): slower than RMM-DIIS, but usually stable, although can still struggle with difficult problems (e.g. magnetism, meta-GGAs and hybrids) • Davidson/RMM-DIIS (ALGO = Fast): Uses ALGO = Normal for the “pre- optimisation”, then switches to ALGO = VeryFast; a good default choice • All-band conjugate gradient (ALGO = All): Slow, but very stable; use as a fallback when ALGO = Normal struggles, and for hybrids
- WMD Group Meeting, September 2015 | Slide 9 Taming TPSS (and other meta-GGAs) !ALGO = Normal | All !GGA = PS METAGGA = TPSS | revTPSS | M06L LASPH = .TRUE. LMIXTAU = .TRUE. !ENAUG = MAX(EAUG) * 1.5 !NGXF = <>; NGYF = <>; NGZF = <>; • In my experience, meta-GGAs can sometimes be more difficult to converge than standard GGA functionals (or even hybrids) RMM-DIIS (ALGO = Fast | VeryFast) sometimes struggle Don’t forget - (rev)TPSS are based on PBE Aspherical gradient corrections inside PAW spheres Pass kinetic-energy density to the charge-density mixer May need to increase ENAUG/NG*F if very accurate forces are needed (e.g. phonons)
- WMD Group Meeting, September 2015 | Slide 10 Parallelisation • The newest versions of VASP implement four levels of parallelism: o k-point parallelism: KPAR o Band parallelism and data distribution: NCORE and NPAR o Parallelisation and data distribution over plane-wave coefficients (= FFTs; done over planes along NGZ): LPLANE o Parallelisation of some linear-algebra operations using ScaLAPACK (notionally set at compile time, but can be controlled using LSCALAPACK) • Effective parallelisation will…: o … minimise (relatively slow) communication between MPI processes, … o … distribute data to reduce memory requirements… o … and make sure the MPI processes have enough work to keep them busy
- WMD Group Meeting, September 2015 | Slide 11 Parallelisation: Workload distribution Cores KPAR k-point groups NPAR band groups NGZ FFT groups (?) • Workload distribution over KPAR k-point groups, NBANDS band groups and NGZ plane- wave coefficient (FFT) groups [not 100 % sure how this works…]
- WMD Group Meeting, September 2015 | Slide 12 Parallelisation: Data distribution Data KPAR k-point groups NPAR band groups NGZ FFT groups (?) • Data distribution over NBANDS band groups and NGZ plane-wave coefficient (FFT) groups [also not 100 % sure how this works…]
- WMD Group Meeting, September 2015 | Slide 13 Parallelisation: KPAR • During a standard DFT calculation, k-points are independent -> k-point parallelism should be linearly scaling, although perhaps not in practice: https://www.nsc.liu.se/~pla/blog/2015/01/12/vasp-how-many-cores/ • <#cores> must be divisible by KPAR, but the parallelisation is via a “round-robin” algorithm, so <#k-points> does not need to be divisible by KPAR -> check how many irreducible k-points you have (head IBZKPT) and set KPAR accordingly k1 k2 k3 k1 k2 k3 k1 k2 k3 KPAR = 1 t = 3 [OK] KPAR = 2; t = 2 [Bad] KPAR = 3 t = 1 [Good] R1 R2 R3 R1 R2 R1
- NCORE : number of cores in band groups NPAR : number of bands treated simultaneously WMD Group Meeting, September 2015 | Slide 14 Parallelisation: NCORE and NPAR NCORE = < #cores > NPAR • Why not NCORE = 1/NPAR = <#cores> (the default)? - more band groups (probably) increases memory pressure and incurs a substantial communication overhead 7.08x 6.41x 6.32x
- WMD Group Meeting, September 2015 | Slide 15 Parallelisation: NCORE and NPAR • WARNING: VASP will increase the default NBANDS to the nearest multiple of the number of groups • Since the electronic minimisation scales as a power of NBANDS, this can backfire in calculations with a large NPAR (e.g. those requiring NPAR = <#cores>) Cores NBANDS Default Adjusted 96 455 480 128 455 512 192 455 576 256 455 512 384 455 768 512 455 512 NBANDS = NELECT 2 + NIONS 2 Example system: • 238 atoms w/ 272 electrons • Default NBANDS = 455 NBANDS = 3 5 NELECT + NMAG
- WMD Group Meeting, September 2015 | Slide 16 Parallelisation: Memory • KPAR: current implementation does not distribute data over k-point groups -> KPAR = N will use N x more memory than KPAR = 1 • NPAR/NCORE: data is distributed over band groups -> decreasing NPAR/increasing NCORE by a factor of N will reduce memory requirements by N x • NPAR takes precedence over NCORE - if you use “master” INCAR files, make sure you don’t define both • The defaults for NPAR/NCORE(NPAR = <#cores>, NCORE = 1) are usually a poor choice for both memory and performance • Band parallelism for hybrid functionals has been supported since VASP 5.3.5; for memory-intensive calculations, it is a good alternative to underpopulating nodes • LPLANE: distributes data over plane-wave coefficients, and speeds things up by reducing communication during FFTs - the default is LPLANE = .TRUE., and should only need to be changed for massively-parallel architectures (e.g. BG/Q)
- WMD Group Meeting, September 2015 | Slide 17 Parallelisation: ScaLAPACK • RMM-DIIS (ALGO = VeryFast | Fast) involves three steps: EDDIAG : subspace diagonalisation RMM-DIIS : electronic minimisation ORTHCH : wavefunction orthogonalisation Routine 312 atoms 624 atoms 1,248 atoms 1,872 atoms EDDIAG 2.90 (18.64 %) 12.97 (22.24 %) 75.26 (26.38 %) 208.29 (31.31 %) RMM-DIIS 12.39 (79.63 %) 42.73 (73.27 %) 187.62 (65.78 %) 379.80 (57.10 %) ORTHCH 0.27 (1.74 %) 2.62 (4.49 %) 22.36 (7.84 %) 77.11 (11.59 %) • EDDIAG and ORTHCH formally scale as N3, and rapidly begin to dominate the SCF cycle time for large calculations • A good scaLAPACK library can improve the performance of these routines in massively- parallel calculations See also: https://www.nsc.liu.se/~pla/blog/2014/01/30/vasp9k/
- WMD Group Meeting, September 2015 | Slide 18 Parallelisation: My “rules of thumb” • For x86_64 IB systems (Archer, Balena, Neon…): o Use KPAR in preference to NPAR o Set NPAR = (<#nodes>/KPAR) or NCORE = <#cores/node> o 1 node/band group per 50 atoms; may want to use 2 nodes/50 atoms for hybrids, or decrease to ½ node per band group for < 10 atoms o ALGO = Fast is a usually a good choice, except for badly-behaved systems o Leave LPLANE at the default (.TRUE.) o For the IBM BG/Q (STFC Hartree): o The Hartree machine currently uses VASP 5.2.x -> no KPAR o Try to choose a square number of cores, and set NPAR = sqrt(<#cores>) o Consider setting LPLANE = .FALSE. if <#cores> ≥ NGZ