Molecular models, threads and you

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

0 comments

Post a comment

    Post a comment
    Embed Video
    Edit your comment Cancel

    1 Favorite

    Molecular models, threads and you - Presentation Transcript

    1. Molecular Models, Threads and You Optimizing the TINKER classical molecular dynamics code while maintaining code readability Jiahao Chen Martínez Group Dept. Chemistry, CATMS, MRL and Beckman CS 498 MG presentation: 2007-12-07
    2. Molecular models/force fields Typical energy function E = covalent bond effects + noncovalent interactions
    3. Molecular models/force fields Typical energy function E= kb (rb − req,b )2+ κa (θa − θeq,a )2 + lnd cos (nπ) d∈dihedrals n a∈angles b∈bonds bond stretch angle torsion dihedrals + - 12 6 qi qj σij σij + + − ij rij rij rij i<j∈atoms i<j∈atoms electrostatics dispersion computation cost = O(N2)
    4. Problem description • The state of the system is given by the position and momentum of every atom (of mass mi) (x1 , p1 , x2 , p2 , · · · , xN , pN ) ∈ R 3×2×N • Solve the system∂p partial differential equations of ∂x p ∂E i i i = =− , i = 1, · · · , N , ∂t mi ∂t ∂xi • with user-specified initial conditions (e.g. with constant temperature and pressure) • Subject to (user-specified) constraints, e.g. fixed bond angles
    5. Many parallel and serial implementations Global Package name Threads MPI Arrays NAMD CHARM++ GROMACS ✓ ✓ TINKER AMBER partly ✓ ✓ CHARMM ✓ LAMMPS ✓ NWChem ✓ ✓
    6. Things I tried • Compiler flags optimization • Cache miss reduction • Lookup tables • Parallelization with OpenMP
    7. Compiler flag optimization flags gfortran 4.1.2 ifort 10.0.023 - - -O0 29.95(2) s 36.30(2) s 32.59(4) s -Os 29.92(3) s +0.77(3) % +10.22(2) % 32.12(3) s -O1 30.22(1) s -0.90(4) % +11.51(1) % -O2 29.66(3) s +0.96(1) % 30.30(2) s +16.54(2) % 30.83(2) s -O3 29.84(2) s +0.38(2) % +15.06(2) % +20.22(1)%2 CE search 28.77(2) s +3.62(3) %1 28.96(2) s 1. FFLAGS =”-falign-functions -falign-jumps -falign-labels -falign-loops -fvpt -fcse-skip-blocks -fdelete-null-pointer- checks -ffast-math -fforce-addr -fgcse -fgcse-lm -fgcse-sm -floop-optimize -fkeep-static-consts -fmerge-constants -fno- defer-pop -fno-guess-branch-probability -fno-math-errno -funsafe-math-optimizations -fno-trapping-math -foptimize- register-move -fregmove -freorder-blocks -freorder-functions -frerun-cse-after-loop -fno-sched-spec -fsched-spec-load -fsched-stalled-insns -fsignaling-nans -fsingle-precision-constant -fstrength-reduce -fthread-jumps -funroll-all-loops” 2. FFLAGS =”-xN -no-prec-div -static -inline-level=1 -ip -fno-alias -fno-fnalias -fno-omit-frame-pointer -fkeep-static- consts -nolib-inline -heap-arrays 1 -pad -O3 -scalar-rep -funroll-loops -complex-limited-range”
    8. Algorithm and time profile N=6 for each time step gfortran 4.1.2 >98% Initialize Remove Move one model and unphysical Flush I/O End time step parameters motions O(N) O(N2) Update Calculate Update Calculate & record Enforce Enforce state potential energy state kinetic energy and temp. & temp. & by t/2 and forces by t/2 properties pressure pressure >59% <31% O(N) O(N ) 2 Calculate Calculate Calculate Calculate Calculate Add up all ... bond angle dihedral dispersion charge compo- interactions interactions interactions interactions interactions nents 9% 12% 8% 37% 26%
    9. An unexpected cost for each time step N=6 Q: WhyRemove15% is >98% Initialize Move one model and unphysical Flush I/O End of total execution time step parameters motions O(N ) Text time spent adding Calculate & record O(N) 2 Update Calculate Update Enforce Enforce numbers!? state potential energy state kinetic energy and temp. & temp. & by t/2 and forces by t/2 properties pressure pressure >59% <31% O(N) O(N ) 2 Add up all Calculate Calculate Calculate Calculate Calculate ... compo- bond angle dihedral dispersion charge nents interactions interactions interactions interactions interactions 9% 12% 8% 37% 26%
    10. A: many L2 cache misses c zero out each of the first derivative components 7 do i = 1, n do j = 1, 3 42 deb(j,i) = 0.0d0 22 other ... end do terms end do ... c sum up to get the total energy and first derivatives energy = eb + ... do i = 1, n do j = 1, 3 desum(j,i) = deb(j,i) + ... 22 other 19 terms 2 derivs(j,i) = desum(j,i) end do end do 70 of 91 cache misses per time step (n = 6) shown
    11. A simple solution c zero out each of the first derivative components 7 do i = 1, n do j = 1, 3 26 42 deb(j,i) = 0.0d0 ... end do end do ... c sum up to get the total energy and first derivatives energy = eb + ... do i = 1, n do j = 1, 3 6 temp = deb(j,i) + ... 1 19 desum(j,i) = temp 12 derivs(j,i) = temp end do end do reduced cache misses from 92 to 41 per time step
    12. Speedup from reducing L2 cache misses flags gfortran 4.1.2 ifort 10.0.023 original 29.95(2) s 28.96(2) s with scalar 27.43(3) s 28.95(1) s replacement speedup +8.44(1) % +0.03(2) % ifort already called with scalar replacement flag
    13. Lookup tables (LUTs) • Calculations of sqrt() and exp() take up 23.8% of execution time • Idea: pre-compute values of sqrt() and exp() in an array and recall them from memory when needed • Caution: LUT should not displace too much data from L2 cache
    14. sqrt() with LUT direct LUT LUT with linear interpolation
    15. exp() with LUT LUT with first-order Taylor direct LUT series refinement* e =e + (x − x0 )e + O (x − x0 ) x x0 x0 2
    16. Choice of implementation desired table expected function refinement precision size speedup (doubl sqrt() 10 -4 10,764 none +118% es) exp() 10-8 6,836 Taylor +151% LUT aligned to 128-bits L2 cache = 4 MB = 512K doubles
    17. Speedup from LUT use flags gfortran 4.1.2 ifort 10.0.023 original 29.95(2) s 28.96(2) s with lookup tables 26.89(1) s 25.87(2) s speedup +10.23(2) % +7.22(3) %
    18. Summary of serial improvements Improvement gfortran 4.1.2 ifort 10.0.023 Best compiler flags +3.62(3) % +20.22(1) % L2 cache miss +8.44(2) % +0.03(1) % reduction Lookup tables +10.23(1) % +7.22(2) % 23.91(3) s 26.86(2) s Total +20.17(4) % +26.00(2) %
    19. Parallelization targets for each time step N=6 >98% Initialize Remove Move one model and unphysical Flush I/O End time step parameters motions Text O(N) O(N2) Update Calculate Update Calculate & record Enforce Enforce state potential energy state kinetic energy and temp. & temp. & by t/2 and forces by t/2 properties pressure pressure >59% <31% O(N) O(N ) 2 Add up all Calculate Calculate Calculate Calculate Calculate ... compo- bond angle dihedral dispersion charge nents interactions interactions interactions interactions interactions 9% 12% 8% 37% 26%
    20. Parallelization strategy Calculate potential energy omp sections and forces 100% omp section 50% omp section 50% Add up all Calculate Calculate Calculate Calculate Calculate ... compo- charge angle dihedral dispersion bond nents interactions interactions interactions interactions interactions 50% 16% 2% 12% 11% omp parallel do omp parallel do omp parallel do omp parallel do omp parallel do
    21. Parallelization results gfortran 4.1.2 35 N=6 N=1000 Ideal 30 Execution time/s 25 20 15 10 # cores 5 0.5 1 1.5 2 2.5 3 3.5 4 4.5
    22. Summary • Free software can sometimes be better than non-free software • L2 cache misses can significantly degrade performance • Lookup tables are an effective tradeoff between speed and memory vs. precision • Simple OpenMP parallelization is effective for small numbers of processors

    + Jiahao ChenJiahao Chen, 2 years ago

    custom

    737 views, 1 favs, 0 embeds more stats

    More info about this document

    © All Rights Reserved

    Go to text version

    • Total Views 737
      • 737 on SlideShare
      • 0 from embeds
    • Comments 0
    • Favorites 1
    • Downloads 13
    Most viewed embeds

    more

    All embeds

    less

    Flagged as inappropriate Flag as inappropriate
    Flag as inappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel
    File a copyright complaint
    Having problems? Go to our helpdesk?

    Categories