Gpu Systems


Published on

Our unique 1U GPU servers allow you to use the latest GPUs (Tesla, GTX285, Quadro FX5800) for visualization or offloading processing in a small form factor. These are built on Intel\'s latest Nehalem processors.

  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Gpu Systems

  1. 1. GPU Systems Advanced Clustering’s offerings for GPGPU computing advanced clustering technologies • 866.802.8222
  2. 2. what is GPU computing • The use of a GPU (graphics processing unit) to do general purpose scientific and engineering computing • Model is to use a CPU and GPU together in a heterogenous computing model • CPU is used to run sequential portions of application • Offload parallel computation onto the GPU 2
  3. 3. history of GPUs • GPUs designed with fixed function pipelines for real-time 3D graphics • As complexity of GPU increased they were designed to be more programable to easily implement new features • Scientists and engineers discovered that the originally purpose built GPUs could also be re- programmed for General Purpose computing on a GPU (GPGPU) 3
  4. 4. history of GPUs - continued • The nature of 3D graphics meant GPUs have very fast floating-point units, which are also great for scientific codes • Originally very difficult to program, GPU vendors have realized another market for their products and developed specially designed GPUs and programming environments for scientific computing • Most prominent is NVIDIA Tesla GPU and their CUDA programming environment 4
  5. 5. GPUs vs. CPUs •Traditional x86 CPUs are 240 Core Tesla GPU available today with 4 cores: 6, 8, 12 core in the future • NVIDIA’s Tesla GPU is shipping with 240 cores Quad-core CPU 5
  6. 6. GPUs vs. CPUs - continued 6
  7. 7. why use GPUs? • Massively parallel design: 240 cores per GPU • Nearly 1 teraflop of single precision floating- point performance • Designed as an accelerator card to add into your existing system - does not replace your current CPU • Maximum of 4GB of fast dedicated RAM per GPU • If your code is highly parallel it’s worth investigating 7
  8. 8. why not use GPUs? • Fixed RAM sizes on GPU - not upgradable or configurable • Large power requirements of 188W • Still requires a host server and CPU to operate • Specialized development tools required, does not run standard x86 code • Current development tools are specific to NVIDIA cards - no support for other manufacturer’s GPUs • Your code maybe difficult to parallelize 8
  9. 9. developing for GPUs • Current development model: CUDA parallel environment • The CUDA parallel programming model guides programmers to partition the problem into coarse sub-problems that can be solved independently in parallel. • Fine grain parallelism in the sub-problems is then expressed such that each sub-problem can be solved cooperatively in parallel. • Currently an extension for the C programming language - other languages in development 9
  10. 10. NVIDIA GPUs • All of NVIDIA’s recent GPUs support CUDA development • Tesla cards designed exclusively for CUDA and GPGPU code (no graphics support) • GeForce cards designed for graphics can be used for CUDA code as well • Usually slower, less cores, or less RAM - but a great way to get started at low price points • Development and testing can be done on almost any standard GeForce GPU and run on a Tesla system 10
  11. 11. GeForce vs. Tesla 11
  12. 12. GPU future • More products coming: AMD Stream processor line of products, similar to NVIDIA’s Tesla • Standard, portable programming via OpenCL • OpenCL (Open Computing Language) is the first open, royalty- free standard for general-purpose parallel programming. Create portable code for a diverse mix of multi-core CPUs, GPUs, Cell-type architectures and other parallel processors such as DSPs. • More info: 12
  13. 13. building GPU systems • Building systems to house GPUs can be difficult: • Requires lots of engineering and design work to be able to be able to power and cool them correctly • GPUs were originally designed for visualization and gaming; size and form-factor were not as important • When used for computation data-center space is limited and expensive - need to find a way to implement GPUs in existing infrastructure 13
  14. 14. traditional GPU servers •Large tower style cases •Rackmount servers 4U or larger •Either choice is not an efficient Text use of limited data center space 14
  15. 15. GPUs are large 1.5” deep lo ng 4.6 .5” ”t 10 all The size of the GPU has limited it’s application 15
  16. 16. GPUs are power hungry = •GPU Cards can use a lot of power - as much as 270W •Lots of power equals lots of heat •Difficult to put into a small space and cool effectively 16
  17. 17. GPU system options Advanced Clustering has two solutions to the power, heat, and density problems: NVIDIA’s Tesla S1070 Advanced Clustering’s 15XGPU nodes 17
  18. 18. NVIDIA’s tesla S1070 • The S1070 is an external 1U box that contains 4x Tesla C1060 GPUs • The S1070 must be connected to one or two host servers to operate • S1070 has one power supply and dedicated cooling for the 4x GPUs • Only available with the C1060 GPU cards pre- installed 18
  19. 19. tesla S1070 - front view 19
  20. 20. tesla S1070 - rear view 20
  21. 21. tesla S1070 - inside view 21
  22. 22. host interface cards (HIC) • The Host Interface Card (HIC) • HICs can be installed in 2 connects Tesla S1070 to Server separate servers, or 1 server • Every S1070 requires 2 HICs • HICs are available in PCI-e 8x and • Each HIC bridges the server to 16x widths two of the four GPUs inside of the S1070 22
  23. 23. tesla S1070 block diagram Tesla S1070 Cables to HICs in Host System(s) 23
  24. 24. connecting S1070 to 2 servers Server #1 Tesla S1070 Most servers do not have enough PCI-e bandwidth, so Server S1070 is designed to allow connecting to 2 separate #2 machines. 24
  25. 25. connecting S1070 to 1 server Tesla S1070 Server If the server has enough PCI-e lanes and expansion slots one Tesla S1070 can be connected to one server. 25
  26. 26. example cluster of S1070s HIC #1 HIC #2 HIC #1 HIC #2 • 10x 1U compute nodes HIC #1 with 2x CPUs each HIC #2 • 5 Tesla S1070 with 4x GPUs each HIC #1 • Balanced system of 20 CPUs and 20 GPUs HIC #2 • All in 15U of rack space HIC #1 HIC #2 26
  27. 27. S1070s pros and cons •Pros •Cons • External enclosure to hold GPUs • Two GPUs share one PCI-e slot in the doesn’t require a special server design host server limiting bandwidth to the to hold the GPUs GPU card • Easy to add GPUs to any existing system • Most 1U servers only have 1x PCI-e • 4 GPUs in only 1U of space expansion slot which is occupied by the HIC - this limits ability to use • Multiple HIC card configurations interconnects like InfiniBand or 10 including PCI-e 8x or 16x Gigabit Ethernet • Thermally tested and validated by • Limited configuration options, only Tesla NVIDIA cards, no GeForce or Quadro options 27
  28. 28. S1070 - specifications 28
  29. 29. advanced clustering GPU nodes • The 15XGPU line of systems is a complete two processor server and GPU in 1U • Server fully configured with latest quad-core Intel Xeon processors, RAM, hard drives, optical, networking, InfiniBand and GPU card • Flexible to support various GPUs, including: • Tesla C1060 card • GeForce series • Quadro series 29
  30. 30. GPU node - front 30
  31. 31. GPU node - rear 31
  32. 32. GPU node - inside 32
  33. 33. GPU node - block diagram Advanced Clustering 15XGPU node Simplified design, host server completely integrated with GPU no external components to connect to. 33
  34. 34. example cluster of GPU nodes • 15x 1U compute nodes • 2x CPUs each • 1x GPU integrated in each node • Entire system contains 30x CPUs and 15x GPUs • All in 15U of rack space 34
  35. 35. GPU nodes - thermals •System carefully engineered to ensure all components will fit in the small form factor •Detailed modeling and testing to make sure the system components (CPU and memory) and the GPU are adequately cooled 35
  36. 36. GPU nodes pros and cons •Pros •Cons • Entire server and GPU all enclosed in a • Only 1x GPU per server 1U package • Requires purchase of new servers, not • Flexibility in GPU choice: Tesla, an upgrade or add-on GeForce, and Quadro supported • Not as dense of a solution as S1070 for • Full PCI-e bandwidth to GPU 4x GPUs • Full-featured server with the latest quad-core Intel Xeon CPUs • Can be used for more than computation, use the GPU for video output as well 36
  37. 37. GPU nodes • The GPU node concept is unique to Advanced Clustering • Only vendor shipping a 1U with integrated Tesla or high-end GeForce / Quadro card • Available for order as the 1X5GPU2 • Dual Quad-Core Intel Xeon 5500 series processors • Choice of GPU 37
  38. 38. 15XGPU2 - specifications • Processor • Management • Two Intel Xeon 5500 Series processors • Integrated IPMI 2.0 module • Next generation "Nehalem" microarchitecture • Integrated management controller providing iKVM • Integrated memory controller and 2x QPI chipset and remote disk emulation. interconnects per processor • Dedicated RJ45 LAN for management network • 45nm process technology • I/O connections • Chipset • Two independent 10/100/1000Base-T (Gigabit) • Intel 5500 I/O controller hub RJ-45 Ethernet interfaces • Memory • Two USB 2.0 ports • 800MHz, 1066MHz, or 1333MHz DDR3 memory • One DB-9 serial port (RS-232) • Twelve DIMM sockets for support up to 144GB of • One VGA port memory • Optional ConnectX DDR or QDR InfiniBand • GPU connector • PCI-e 2.0 16x double height expansion slot for GPU • Electrical Requirements • Multiple options: Tesla, GeForce, or Quadro cards • High-efficiency power supply (greater than 80%) • Storage • Output Power: 560W • Two 3.5" SATA2 drive bay • Universal input voltage 100V to 240V • Support RAID level 0-1 with Linux software RAID • Frequency: 50Hz to 60Hz, single phase (with 2.5" drives) • DVD+RW slim-line optical drive 38
  39. 39. availability • Both the Tesla S1070 and 15XGPU GPU nodes are available and shipping now • For price and custom configuration contact your Account Representative • (866) 802-8222 • • 39