The 2D Ising Model on GPU Clusters

1,179 views
839 views

Published on

This talk was given by me at the Spring Meeting 2010 of the DPG at Regensburg today, in the division "Dynamics and Statistical Physics".

Published in: Education, Technology, Sports
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,179
On SlideShare
0
From Embeds
0
Number of Embeds
74
Actions
Shares
0
Downloads
8
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

The 2D Ising Model on GPU Clusters

  1. 1. The 2D Ising Model on GPU ClustersBenjamin BlockUniversity of Mainz, Institute for PhysicsThanks to: Tobias Preis, Peter Virnau
  2. 2. Overview• GPUs: Optimized for massively parallel processing• Previous work: GPU Accelerated Ising Model• Architecture specific optimization• GPU clusters begin to establish – Multi GPUimplementation usefulT. Preis, P. Virnau, W. Paul, J. J. Schneider:GPU Accelerated Monte Carlo Simulation of the2D and 3D Ising Model, J. Comp. Phys., 228 (2009)
  3. 3. Ising Model (Ferromagnetism)T >> TC T ~ TC T << TCLattice of spins
  4. 4. Metropolis Monte CarloPerform successive spin flips!Probability: Metropolis criterion
  5. 5. Parallelization of Metropolis UpdatesIdea: Update non-interacting domains in parallelCheckerboard Update
  6. 6. Programming the GPUSlowglobalmemoryFastsharedmemoryStorespin latticeUse for localcomputationsExecute the same code for different data in parallelUtilize different kinds of memory
  7. 7. Reduce slow memory accessSlowglobalmemoryFastsharedmemoryIdea: Store 4x4 spin blocks in 1 unit of GPU memoryFor eachparallel threadAccess 16 spins with one memory lookupPerform local computations in (fast) shared memory
  8. 8. XORUpdate scheme in shared memoryInteger array inshared memoryPerformComputations(draw randomnumber, evaluateMetropolis criterion)Old spins New spinsUpdate pattern=
  9. 9. Performance measurementCPUpreviousCPUoptimizedGPUpreviousGPUoptimizedFair comparison:Heavily optimizedCPU implementationHow to measure performance?Single spin flips per time unit!~ 20x~ 200x
  10. 10. Multi GPU communicationDistribute spin lattice among many GPUsBorder information has to be passed between GPUs aftereach complete update step
  11. 11. Multi-GPU PerformanceMeasure: Single spin flips per GPUCommunicationoverheadBottleneck forsmall system sizes
  12. 12. Simulation on GPU Clusters• On 64 GPUs: 256 GB video memory!• A lattice of 800.000 x 800.000 spins could beprocessed.• Processing the whole lattice on 64 GPUs: 3 seconds!Tesla S1070 UnitAt NEC NehalemCluster Stuttgart128 GPUs
  13. 13. Conclusion• Optimization is important (CPU and GPU) for faircomparison• The 2D Ising model is a good candidate for parallelprocessing on GPU clusters• Submitted to be published in Computer PhysicsCommunications• Source code will be made available atwww.tobiaspreis.de

×