Advertisement

VASP: Some Accumulated Wisdom

Presidential Fellow at University of Manchester
Feb. 2, 2016
Advertisement

More Related Content

Similar to VASP: Some Accumulated Wisdom(20)

Advertisement

VASP: Some Accumulated Wisdom

  1. VASP: Some Accumulated Wisdom J. M. Skelton WMD Group Meeting 21st September 2015
  2. WMD Group Meeting, September 2015 | Slide 2 Convergence: Parameters • Four key technical parameters in a VASP calculation: o Basis set: ENCUT and PREC (or, alternatively, NGX, NGY, NGZ) o k-point sampling: KPOINTS file and SIGMA o [For certain types of pseudopotential.] Augmentation grid: ENAUG and PREC (or, alternatively, NGXF, NGYF, NGZF) o Which space the projection operators are applied in (LREAL)
  3. WMD Group Meeting, September 2015 | Slide 3 Convergence: Augmentation grid • A second, finer mesh is used to represent the charge density near the ion cores: controlled by ENAUG (or PREC + EAUG in the POTCAR files), which determines NG*F
  4. WMD Group Meeting, September 2015 | Slide 4 Convergence: ZnS revisited • For calculations on ZnS with TPSS, ENAUG needs to be increased from the default (but ENCUT = 550 eV is fine) - equivalent to increasing NG*F [but without also increasing NG* as in the QHA-ExC paper, which evidently unnecessary (!)]
  5. WMD Group Meeting, September 2015 | Slide 5 Convergence: ZnS revisited • For calculations on ZnS with TPSS, ENAUG needs to be increased from the default (but ENCUT = 550 eV is fine) - equivalent to increasing NG*F, but without also increasing NG*, which is wasteful ENCUT / eV ENAUG / eV NG* NG*F Noise? t / min 550 575.892 120 160 û - 650 575.892 128 160 û - 750 575.892 140 160 û - 850 575.892 150 160 û - 550 675.892 120 180 ü 116 550 775.892 120 192 ü 108 550 875.892 120 200 ü 113
  6. WMD Group Meeting, September 2015 | Slide 6 The VASP SCF cycle • The SCF cycle proceeds in two phases: o The plane-wave coefficients are initialised randomly and “pre-optimised” within a fixed potential given by the superposition of atomic densities (INIWAV, NELMDL) o The wavefunctions and density are then optimised self-consistently to convergence (EDIFF, NELMIN, NELM) o If an initial charge density exists (e.g. from a previous SCF or converged CHGCAR/WAVECAR), the first step can be skipped (ISTART, ICHARG) • To accelerate convergence, the output density from a step N is not fed directly into the next step N+1, but is mixed with the input density (IMIX, INIMIX, MIXPRE, MAXMIX, AMIX, AMIN, AMIX_MAG, BMIX, BMIX_MAG, WC) • For the mathematically-minded: http://th.fhi-berlin.mpg.de/th/Meetings/DFT-workshop- Berlin2011/presentations/2011-07-14_Marsman_Martijn.pdf
  7. WMD Group Meeting, September 2015 | Slide 7 The VASP SCF cycle N E dE d eps ncg rms rms(c) DAV: 1 0.425437171796E+04 0.42544E+04 -0.38613E+05 920 0.178E+03 DAV: 2 -0.114846409831E+04 -0.54028E+04 -0.51653E+04 1130 0.323E+02 DAV: 3 -0.169662738043E+04 -0.54816E+03 -0.53994E+03 1130 0.100E+02 DAV: 4 -0.171494085624E+04 -0.18313E+02 -0.18206E+02 1160 0.198E+01 DAV: 5 -0.171553585547E+04 -0.59500E+00 -0.59387E+00 1220 0.331E+00 0.706E+01 RMM: 6 -0.159733114612E+04 0.11820E+03 -0.21124E+02 920 0.147E+01 0.352E+01 RMM: 7 -0.157358217358E+04 0.23749E+02 -0.82778E+01 920 0.937E+00 0.173E+01 RMM: 8 -0.157195752202E+04 0.16247E+01 -0.10028E+01 922 0.344E+00 0.736E+00 RMM: 9 -0.157170732229E+04 0.25020E+00 -0.24051E+00 920 0.173E+00 0.186E+00 RMM: 10 -0.157170709721E+04 0.22508E-03 -0.17654E-01 932 0.561E-01 0.965E-01 RMM: 11 -0.157173130475E+04 -0.24208E-01 -0.10240E-01 920 0.332E-01 0.466E-01 RMM: 12 -0.157174953342E+04 -0.18229E-01 -0.23004E-02 920 0.198E-01 0.213E-01 RMM: 13 -0.157175624413E+04 -0.67107E-02 -0.12470E-02 920 0.134E-01 0.938E-02 RMM: 14 -0.157175705572E+04 -0.81159E-03 -0.49641E-03 922 0.781E-02 0.577E-02 RMM: 15 -0.157175711576E+04 -0.60039E-04 -0.62130E-04 922 0.302E-02 0.211E-02 RMM: 16 -0.157175714692E+04 -0.31162E-04 -0.18825E-04 932 0.152E-02 0.146E-02 RMM: 17 -0.157175715237E+04 -0.54516E-05 -0.37827E-05 935 0.701E-03 0.564E-03 RMM: 18 -0.157175715526E+04 -0.28845E-05 -0.88070E-06 824 0.340E-03 0.361E-03 RMM: 19 -0.157175715551E+04 -0.24851E-06 -0.27408E-06 657 0.209E-03 1 F= -.15717572E+04 E0= -.15717572E+04 d E =-.291254-147 Between NELMIN and NELM steps in total NELMDL steps in a fixed potential Minimisation algorithm Total free energy Change in total energy and eigenvalues Number of evaluations of 𝐻"#𝛹⟩ Difference in input and output density; oscillations probably indicate convergence problems Total free and zero-broadening (𝜎 → 0) energy
  8. WMD Group Meeting, September 2015 | Slide 8 The ALGO tag • ALGO is the “recommended” tag for selecting the electronic-minimisation algorithm • Most of the algorithms have “subswitches”, which can be selected using IALGO • I tend to use one of four ALGOs: • RMM-DIIS (ALGO = VeryFast): fastest per SCF step, best parallelised, and converges quickly close to a minimum, but can struggle with difficult systems • Blocked Davidson (ALGO = Normal): slower than RMM-DIIS, but usually stable, although can still struggle with difficult problems (e.g. magnetism, meta-GGAs and hybrids) • Davidson/RMM-DIIS (ALGO = Fast): Uses ALGO = Normal for the “pre- optimisation”, then switches to ALGO = VeryFast; a good default choice • All-band conjugate gradient (ALGO = All): Slow, but very stable; use as a fallback when ALGO = Normal struggles, and for hybrids
  9. WMD Group Meeting, September 2015 | Slide 9 Taming TPSS (and other meta-GGAs) !ALGO = Normal | All !GGA = PS METAGGA = TPSS | revTPSS | M06L LASPH = .TRUE. LMIXTAU = .TRUE. !ENAUG = MAX(EAUG) * 1.5 !NGXF = <>; NGYF = <>; NGZF = <>; • In my experience, meta-GGAs can sometimes be more difficult to converge than standard GGA functionals (or even hybrids) RMM-DIIS (ALGO = Fast | VeryFast) sometimes struggle Don’t forget - (rev)TPSS are based on PBE Aspherical gradient corrections inside PAW spheres Pass kinetic-energy density to the charge-density mixer May need to increase ENAUG/NG*F if very accurate forces are needed (e.g. phonons)
  10. WMD Group Meeting, September 2015 | Slide 10 Parallelisation • The newest versions of VASP implement four levels of parallelism: o k-point parallelism: KPAR o Band parallelism and data distribution: NCORE and NPAR o Parallelisation and data distribution over plane-wave coefficients (= FFTs; done over planes along NGZ): LPLANE o Parallelisation of some linear-algebra operations using ScaLAPACK (notionally set at compile time, but can be controlled using LSCALAPACK) • Effective parallelisation will…: o … minimise (relatively slow) communication between MPI processes, … o … distribute data to reduce memory requirements… o … and make sure the MPI processes have enough work to keep them busy
  11. WMD Group Meeting, September 2015 | Slide 11 Parallelisation: Workload distribution Cores KPAR k-point groups NPAR band groups NGZ FFT groups (?) • Workload distribution over KPAR k-point groups, NBANDS band groups and NGZ plane- wave coefficient (FFT) groups [not 100 % sure how this works…]
  12. WMD Group Meeting, September 2015 | Slide 12 Parallelisation: Data distribution Data KPAR k-point groups NPAR band groups NGZ FFT groups (?) • Data distribution over NBANDS band groups and NGZ plane-wave coefficient (FFT) groups [also not 100 % sure how this works…]
  13. WMD Group Meeting, September 2015 | Slide 13 Parallelisation: KPAR • During a standard DFT calculation, k-points are independent -> k-point parallelism should be linearly scaling, although perhaps not in practice: https://www.nsc.liu.se/~pla/blog/2015/01/12/vasp-how-many-cores/ • <#cores> must be divisible by KPAR, but the parallelisation is via a “round-robin” algorithm, so <#k-points> does not need to be divisible by KPAR -> check how many irreducible k-points you have (head IBZKPT) and set KPAR accordingly k1 k2 k3 k1 k2 k3 k1 k2 k3 KPAR = 1 t = 3 [OK] KPAR = 2; t = 2 [Bad] KPAR = 3 t = 1 [Good] R1 R2 R3 R1 R2 R1
  14. NCORE : number of cores in band groups NPAR : number of bands treated simultaneously WMD Group Meeting, September 2015 | Slide 14 Parallelisation: NCORE and NPAR NCORE = < #cores > NPAR • Why not NCORE = 1/NPAR = <#cores> (the default)? - more band groups (probably) increases memory pressure and incurs a substantial communication overhead 7.08x 6.41x 6.32x
  15. WMD Group Meeting, September 2015 | Slide 15 Parallelisation: NCORE and NPAR • WARNING: VASP will increase the default NBANDS to the nearest multiple of the number of groups • Since the electronic minimisation scales as a power of NBANDS, this can backfire in calculations with a large NPAR (e.g. those requiring NPAR = <#cores>) Cores NBANDS Default Adjusted 96 455 480 128 455 512 192 455 576 256 455 512 384 455 768 512 455 512 NBANDS = NELECT 2 + NIONS 2 Example system: • 238 atoms w/ 272 electrons • Default NBANDS = 455 NBANDS = 3 5 NELECT + NMAG
  16. WMD Group Meeting, September 2015 | Slide 16 Parallelisation: Memory • KPAR: current implementation does not distribute data over k-point groups -> KPAR = N will use N x more memory than KPAR = 1 • NPAR/NCORE: data is distributed over band groups -> decreasing NPAR/increasing NCORE by a factor of N will reduce memory requirements by N x • NPAR takes precedence over NCORE - if you use “master” INCAR files, make sure you don’t define both • The defaults for NPAR/NCORE(NPAR = <#cores>, NCORE = 1) are usually a poor choice for both memory and performance • Band parallelism for hybrid functionals has been supported since VASP 5.3.5; for memory-intensive calculations, it is a good alternative to underpopulating nodes • LPLANE: distributes data over plane-wave coefficients, and speeds things up by reducing communication during FFTs - the default is LPLANE = .TRUE., and should only need to be changed for massively-parallel architectures (e.g. BG/Q)
  17. WMD Group Meeting, September 2015 | Slide 17 Parallelisation: ScaLAPACK • RMM-DIIS (ALGO = VeryFast | Fast) involves three steps: EDDIAG : subspace diagonalisation RMM-DIIS : electronic minimisation ORTHCH : wavefunction orthogonalisation Routine 312 atoms 624 atoms 1,248 atoms 1,872 atoms EDDIAG 2.90 (18.64 %) 12.97 (22.24 %) 75.26 (26.38 %) 208.29 (31.31 %) RMM-DIIS 12.39 (79.63 %) 42.73 (73.27 %) 187.62 (65.78 %) 379.80 (57.10 %) ORTHCH 0.27 (1.74 %) 2.62 (4.49 %) 22.36 (7.84 %) 77.11 (11.59 %) • EDDIAG and ORTHCH formally scale as N3, and rapidly begin to dominate the SCF cycle time for large calculations • A good scaLAPACK library can improve the performance of these routines in massively- parallel calculations See also: https://www.nsc.liu.se/~pla/blog/2014/01/30/vasp9k/
  18. WMD Group Meeting, September 2015 | Slide 18 Parallelisation: My “rules of thumb” • For x86_64 IB systems (Archer, Balena, Neon…): o Use KPAR in preference to NPAR o Set NPAR = (<#nodes>/KPAR) or NCORE = <#cores/node> o 1 node/band group per 50 atoms; may want to use 2 nodes/50 atoms for hybrids, or decrease to ½ node per band group for < 10 atoms o ALGO = Fast is a usually a good choice, except for badly-behaved systems o Leave LPLANE at the default (.TRUE.) o For the IBM BG/Q (STFC Hartree): o The Hartree machine currently uses VASP 5.2.x -> no KPAR o Try to choose a square number of cores, and set NPAR = sqrt(<#cores>) o Consider setting LPLANE = .FALSE. if <#cores> ≥ NGZ
Advertisement