Determination of line tension in the 3D Ising model on GPUs

696
-1

Published on

This is the talk I gave at the 2nd International Symposium
“Computer Simulations on GPU” (SimGPU 2013)

Published in: Technology, Sports
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
696
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
11
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Determination of line tension in the 3D Ising model on GPUs

  1. 1. Determination of line tensionin the 3D Ising model on GPUsBenjamin Block, Tobias Preis, David Winter, Suam Kim,Peter Virnau, Kurt BinderUniversity of Mainz, Institute for PhysicsSimGPU 2013
  2. 2. Topic Touched1. Ising Model on GPU
  3. 3. Topic Touched1. Ising Model on GPU2. Line Tension Estimation
  4. 4. Ising ModelOrderedRandom Transition+ nearest neighbor interaction <
  5. 5. Monte CarloPerform successive spin flips!Probability: Metropolis criterionInherently serial... but
  6. 6. GPU Implementation• GPUs: massively parallel processingT. Preis, P. Virnau, W. Paul, J. J. Schneider:GPU Accelerated Monte Carlo Simulation ofthe 2D and 3D Ising Model, J. Comp. Phys.,228 (2009)• Architecture specific optimization• Multi GPU implementation
  7. 7. Parallelization of Lattice UpdatesIdea: Update non-interacting domains in parallelCheckerboard Update
  8. 8. Reduce slow memory access
  9. 9. Reduce slow memory accessuint4 blocksin globalmemoryIdea: Store spins in 128 bit (uint4) chunks
  10. 10. Reduce slow memory accessuint4 blocksin globalmemoryIdea: Store spins in 128 bit (uint4) chunksAccess 128 spins with one memory lookup
  11. 11. Reduce slow memory accessuint4 blocksin globalmemoryOnethreadIdea: Store spins in 128 bit (uint4) chunksAccess 128 spins with one memory lookupExtract spins in local thread memory (registers) forcomputation
  12. 12. Update schemeuint4
  13. 13. Update schemeuint4
  14. 14. Update schemeExtract chunk inthreaduint4
  15. 15. Update schemeExtract chunk inthreadPerformComputations(draw randomnumber, evaluateMetropolis criterion)uint4
  16. 16. Update schemeExtract chunk inthreadPerformComputations(draw randomnumber, evaluateMetropolis criterion)Update patternuint4
  17. 17. XORUpdate schemeExtract chunk inthreadPerformComputations(draw randomnumber, evaluateMetropolis criterion)Old spins New spinsUpdate pattern=uint4
  18. 18. Multispin Coding?• Multiple spins are coded in memory unit (128spins in 128 bit)
  19. 19. Multispin Coding?• Multiple spins are coded in memory unit (128spins in 128 bit)• Computation is not done on encoded spins inparallel but serial in each chunk
  20. 20. Multispin Coding?• Multiple spins are coded in memory unit (128spins in 128 bit)• Computation is not done on encoded spins inparallel but serial in each chunk• Multispin coding algorithms designed for CPUswere not efficient on GPU
  21. 21. Multispin Coding?• Multiple spins are coded in memory unit (128spins in 128 bit)• Computation is not done on encoded spins inparallel but serial in each chunk• Multispin coding algorithms designed for CPUswere not efficient on GPUWhy??
  22. 22. Multispin Coding
  23. 23. Array of spins (1 bit = 1 spin)
  24. 24. ?Array of spins (1 bit = 1 spin)MC step:
  25. 25. ?Array of spins (1 bit = 1 spin)MC step:
  26. 26. ?Array of spins (1 bit = 1 spin)MC step:In advance:
  27. 27. ?Array of spins (1 bit = 1 spin)MC step:PooledrandompatternsNeighbors(Bitwise)Judgement function:(for eachenergy level)
  28. 28. ?Array of spins (1 bit = 1 spin)MC step:Pool of randompatterns
  29. 29. ?Array of spins (1 bit = 1 spin)MC step:select onepatternrandomlyConstruct update pattern
  30. 30. Array of spins (1 bit = 1 spin)XOR
  31. 31. Array of spins (1 bit = 1 spin)XOR=Spins for next step
  32. 32. Downsides of Pooling• Impairs quality of simulation (the smaller thepool the less random)
  33. 33. Downsides of Pooling• Impairs quality of simulation (the smaller thepool the less random)• Low flexibility (external fields...)
  34. 34. Downsides of Pooling• Impairs quality of simulation (the smaller thepool the less random)• Low flexibility (external fields...)• Relies on a lot of precomputation and randommemory lookups (GPU killer)
  35. 35. PerformanceCPUsimpleCPUmultispincodingGPUsimpleGPUoptimized~ 20x~ 200xResults from 20112D IsingGPU: NVIDIA Tesla S1070CPU: Intel i7 (2.67 GHz, 1 core)
  36. 36. PerformanceCPUsimpleCPUmultispincodingGPUsimpleGPUoptimized~ 20xGPU: NVIDIA Tesla S1070CPU: Intel i7 (2.67 GHz, 1 core)Results from 20112D Ising
  37. 37. PerformanceCPUsimpleCPUmultispincodingGPUsimpleGPUoptimized~ 20xGPU: NVIDIA Tesla S1070CPU: Intel i7 (2.67 GHz, 1 core)Results from 20112D Ising8x, still one core!
  38. 38. PerformanceCPUsimpleCPUmultispincodingGPUsimpleGPUoptimizedResults from 20112D IsingGPU: NVIDIA Tesla S1070CPU: Intel i7 (2.67 GHz, 1 core)
  39. 39. PerformanceCPUsimpleCPUmultispincodingGPUsimpleGPUoptimized~ 20xResults from 20112D IsingGPU: NVIDIA Tesla S1070CPU: Intel i7 (2.67 GHz, 1 core)
  40. 40. PerformanceCPUsimpleCPUmultispincodingGPUsimpleGPUoptimized~ 20x~ 200xResults from 20112D IsingGPU: NVIDIA Tesla S1070CPU: Intel i7 (2.67 GHz, 1 core)
  41. 41. Simulation on multiple GPUsSpread spin lattice over many GPUsin different machinesExchange border informationbetween machines via MPI
  42. 42. Simulation Domains per GPU Border Arrays
  43. 43. Multi-GPU PerformanceMeasure: Single spin flips per GPUCommunicationoverheadBottleneck forsmall system sizes
  44. 44. • 64 GPUs: 256 GB video memory• Enough for a lattice of 800.000 x 800.000 spins• One lattice sweep: 3 seconds on pre-Fermi (S1070)hardware
  45. 45. ?
  46. 46. ?OpenCL??
  47. 47. Platform independence51
  48. 48. KernelsIdea: Hide language differences in macros
  49. 49. Macros expand to different expressions on each platform•CUDA (Driver API)•OpenCL•Host C
  50. 50. Initialization• Initialize• Load “Device Programs” (Kernels) from source• Create Data Containers that take care of data
  51. 51. Run kernel with parametersUse data on host
  52. 52. Cross platform performance56CPU: i7NehalemNvidia:Geforce GTX580AMD: HD 69703D IsingExample
  53. 53. Results
  54. 54. Results• Downside: Lowest common denominator(CUDA has a lot more features by now)
  55. 55. Results• Downside: Lowest common denominator(CUDA has a lot more features by now)• No explicit copying needed (containers job)
  56. 56. Results• Downside: Lowest common denominator(CUDA has a lot more features by now)• No explicit copying needed (containers job)• In our case: OpenCL was 10% slower on NVIDIA card(Geforce GTX580)
  57. 57. Results• Downside: Lowest common denominator(CUDA has a lot more features by now)• No explicit copying needed (containers job)• In our case: OpenCL was 10% slower on NVIDIA card(Geforce GTX580)• slower on comparable AMD card (Radeon HD 6970)
  58. 58. Results• Downside: Lowest common denominator(CUDA has a lot more features by now)• No explicit copying needed (containers job)• In our case: OpenCL was 10% slower on NVIDIA card(Geforce GTX580)• slower on comparable AMD card (Radeon HD 6970)• Take this with a grain of salt
  59. 59. Nucleation
  60. 60. Nucleation phenomena• Nucleation important in materialsresearch, atmosphere, etc
  61. 61. NucleationPhase 1 Phase 2
  62. 62. NucleationPhase 1 Phase 2Induced by nuclei!
  63. 63. Most spins up Most spins down
  64. 64. Heterogeneous NucleationWall attached droplet
  65. 65. =
  66. 66. Simulation in the Ising ModelWinter D., Virnau P., Binder K., PRL Volume 103 Issue 22 (2009)
  67. 67. Young
  68. 68. Free Energy of DropletΗ=0, Θ=90oWinter D., Virnau P., Binder K., PRL Volume 103 Issue 22 (2009)
  69. 69. Young
  70. 70. Line Contribution
  71. 71. Line Contribution
  72. 72. A different method...
  73. 73. A different method...Surface field H > 0 which tilts interface
  74. 74. A different method...Surface field H > 0 which tilts interface
  75. 75. A different method...Antiperiodic BoundaryConditions force and stabilizean interfaceSurface field H > 0 which tilts interface
  76. 76. A different method...Antiperiodic BoundaryConditions force and stabilizean interfaceSurface field H > 0 which tilts interfaceAngle is limited by geometry...
  77. 77. Flatten geometryLxLyFlattened geometry in dimension X allows for stronger tiltLz
  78. 78. Boundary ConditionImplementation83Simulate one extra chunk in each dimension
  79. 79. Boundary ConditionImplementationPeriodic: Exchange borders
  80. 80. Boundary ConditionImplementationAPBC: Read, XOR 1, Write
  81. 81. Thermodynamic integration• Vary box size in all dimensions• Measure Free Energies of surfaces byintegration over magnetization
  82. 82. • Expressions can be derived for the Free Energydifferences in each dimensionYoung’s Equation(1)(2)(3)
  83. 83. • Expressions can be derived for the Free Energydifferences in each dimensionYoung’s EquationCombination of the first two expressionsAllows extraction of Line Tension(1)(2)(3)
  84. 84. • Which can be combined to an expression for theline tension:(1) (2)(3)
  85. 85. Putting it together- -
  86. 86. 9191(2011) Kim et al.T=3.0
  87. 87. Side viewTop viewDensity Profile3D System:56x120x120 spins
  88. 88. 9393
  89. 89. Conclusion
  90. 90. Conclusion• Direct method to measure line tension for tiltedsurfaces
  91. 91. Conclusion• Direct method to measure line tension for tiltedsurfaces• Our first real world use of the Ising Model onGPUs
  92. 92. Conclusion• Direct method to measure line tension for tiltedsurfaces• Our first real world use of the Ising Model onGPUs• Optimization is important (CPU and GPU) forfair comparison
  93. 93. Conclusion• Direct method to measure line tension for tiltedsurfaces• Our first real world use of the Ising Model onGPUs• Optimization is important (CPU and GPU) forfair comparison• Platform independence is possible (useful?)
  94. 94. Conclusion• Direct method to measure line tension for tiltedsurfaces• Our first real world use of the Ising Model onGPUs• Optimization is important (CPU and GPU) forfair comparison• Platform independence is possible (useful?)• The Ising model is a good candidate for parallelprocessing on GPU clusters
  95. 95. Publications• Monte Carlo Test of the Classical Theory for HeterogeneousNucleation BarriersWinter D., Virnau P., Binder K., Phys.Rev.Let. 103, 22 (2009)• Multi-GPU Accelerated Multi-Spin Monte Carlo Simulations ofthe 2D Ising modelBlock, B., Virnau, P., Preis, T.:, Computer Physics Communications,Volume 181, Issue 9 (2010)• Monte Carlo Methods for Estimating Interfacial Free Energiesand Line TensionsBinder, K., Block., B., Das, S. K., Virnau, P., Winter, D., J. Stat.Phys (2011)• Platform independent, efficient implementation of the Isingmodel on parallel acceleration devicesBlock B. J., Eur. Phys. J. Spec. Top. (2012)

×