Welcome, today I am excited to show you how NVIDIA Tesla GPU solutions are having a profound impact on science by breaking new barriers in computing performance. Researchers all over the world have embraced computing as the third pillar of science. Now with Tesla GPU Computing, explosive performance gains are allowing academic researchers to discover new theories, build more robust models and publish more papers.I will share highlights of successful academic institutions and researchers achieving their goals of faster, better science while doing so within academic budget constraints.
With the growing need to use computing to achieve new frontiers in science and research, we quickly identified barriers to growing this need. First of all, we need to enable the researchers and scientists to do faster and more discovery with higher amounts of accuracy. We need to also do that with maximum performance per dollar, because we all have budgets. We need to do it in the most efficient manner, whether that be efficiency of power, or even efficiency in space.
It’s exciting to show that GPU computing can address all of the most important barriers of delivering game changing ability in computational research.For example: AMBER – a very popular computational chemistry application can allow researchers to see 6x more simulation data per day, achieving 88 nanoseconds in a day, what would take a week to simulate on CPUs alone.Now let’s see how much does that actually cost, well by adding just 50% cost to a system, you are getting over a 300% performance gain.And finally GPUs are very power efficient. The #2 and #3 most powerful supercomputers in the world are a great example. China’s Tianhe-1A, taking the #2 spot, is 2.5x more power efficient than oak ridge’s Jaguar CPU only system.
We have certainly reached the inflection point of broad adoption of GPU computing.Over 580 universities are teaching GPU computing as part of their regular curriculum. In fact, this year the Chinese Ministry of education will be requiring 200 of their higher education institutions to make NVIDIA’s CUDA parallel programming part of the curriculum.It’s been a growing trend for more and more government funding being awarded to GPU projects by the NIH, NSF or DOE.Not only large projects, like Oak Ridge’s Titan project which incorporates some 18 thousand GPUs, but also university infrastructure grants and department/research grants to develop GPU computing applications are being regularly awarded.
UCLA was faced with many of challenges or barriers of HPC. The challenges they faced were that they needed to accelerate a new innovative Plasma simulation. And they also needed to overcome space and power constraints. So their solution was a cluster with 96 nodes and 288 NVIDIA Tesla GPUs. The impact was considerable. The GPUs resulted in 20% higher performance with the same power cost. Additionally, the GPUs extended to new groups within departments for greater accelerated modeling.So here they were able to offer faster and more performance as well as fitting within a budget they had for both space and power.
NVIDIA’s GPU accelerated application footprint is growing exponentially year over year. Computational scientists and developers have realized that the future is in parallel computing.Native GPU acceleration has now made its way into the most widely used and published against scientific applications. This breadth of applications enables each school and department’s domain scientist population, specifically those who aren’t programmers, to reap the benefits of GPU acceleration.
Equally important to applications, enabling domain scientists, we have been developing easier and easier approaches to develop your own applications for GPUs.For fastest and easiest approach we have our “drop in” libraries.Many scientific applications make wide use of standard templates or math libraries. NVIDIA makes freely available the most commonly used such as Thrust, a templated library and many math libraries such as BLAS, fft and Sparse matrices.Another extremely non-invasive way to get application acceleration is to apply open ACC directives to your existing application. It takes only a few lines of code to get a 2-10 times speedup in just a matter of days or hours.Finally if you are a developer and need the maximum amount of performance, we support you in your native programming language.
Engineers and scientists worldwide rely on MATLAB to accelerate the pace of discovery, innovation, and development in disciplines such as automotive, aerospace, electronics, financial services, biotech, and many other industriesEngineers and scientists are successfully employing GPU technology, to accelerate their discipline-specific calculations. With minimal effort and without extensive knowledge of GPUs, you can now use the promising power of GPUs with MATLAB.
(previous script from AMBER 11 benchmarks. Slide showsK20 results)I briefly spoke about AMBER’s price performance in our opening. Now that you see how easy it is for researchers and scientists to benefit from GPU computing with ready to go applications or easy to implement developer approaches such as directives, we should revisit price performance. See again, on a single node when applying 2 GPUs, this will essentially increase the node cost by 50%, we get much more than a 50% performance improvement. In fact, with this application we achieve greater than 300% higher performance making GPUs a clear winning investment.Additional Information on K20 Slide:1 CPU node (dual CPUs) = 12.47 ns/day1 CPU+ GPU node (dual CPUs and GPUs) = 95.59 ns/day
NAMD, another extremely popular Molecular Dynamics package, here is showing that it gets up to a 2.7x speedup with GPUs. We’ve benchmarked it with a typical STMV benchmark, which is 1 million atoms. So this is a very large system. But these are the systems and simulation times needed for researchers to make breakthroughs in science. 32 64 128 256 512 640 768s/step GPU XK6 1.2414 0.660887 0.342743 0.199465 0.10837 0.089752 0.0774948s/step CPU XK6 4.62633 2.36707 1.19722 0.609124 0.314745 0.255016 0.209511ns/day Fermi XK6 0.069599 0.13073339 0.252084 0.433159 0.797269 0.962655 1.114913517ns/day CPU XK6 0.018676 0.03650082 0.072167 0.141843 0.274508 0.338802 0.412388848
Today more than ever, it’s easier for researchers, scientists and academic institutions to benefit from GPU computing. We have ready-to-go GPU accelerated applications (see the Applications Catalog). We are continuously investing in creating the easiest approaches to quickly accelerating your own applications; OpenACC directives being our latest development.And finally, the GPU Test Drive cluster is the ideal solution to easily test how a particular application accelerates with GPUs. The GPU Test Drive clusteris also pre-configured for easy purchase and installations
Thank you for following along.I hope we have proved to you that GPU computing is making extraordinary contributions to science and research.Now is the time to reach your next scientific computing achievements by investing in NVIDIA Tesla GPUs which have worldwide adoption and world class developer support.
GPU Computing In Higher Education And Research
ACCELERATE RESEARCHNVIDIA TESLA
Lift the Barriers of HPC Faster / Maximum Greater Budget & More Research Performance Power EfficienciesFaster, More Discovery, More Performance More Performance Higher Accuracy per dollar per watt
GPU Impact to Computational Research More Research + Maximum Performance + Efficient Power88ns/day, 6x Faster 318% Higher Performance 2.5x Flops / Watt 54% Added Cost Tianhe-1A: CPU + GPU JAC simulation time 23,558 Atoms DHFR AMBER 11 Jaguar: CPU only CPU: Dual socket Intel XeonAxel Kohlmeyer: Temple University Tianhe-1A: #2 Top500; Jaguar: #3 Top500 X5670, 2.93 GHz (12 cores)
UCLADepartment of Physics and AstronomyChallenge Accelerate Plasma Research with innovative Particle-in-Cell (PIC) Simulations Overcome space and power constraints in data centers Integrate into shared computing strategy across institutes and centers at UCLASolution GPU cluster 96 server nodes 288 NVIDIA Tesla GPUs Upgraded GPUs to NVIDIA Tesla M2090s (from M2070)Impact Upgrades resulted in 20% higher performance with same power cost GPUs extended to new groups within department for greatly accelerated modeling Solves faster performance requirements within limited space and power constraints #235 on prestigious Top500 list with only 6 Racks
Add GPUs: Accelerate Science Applications CPU GPU
3 Ways to Accelerate Applications Applications OpenACC ProgrammingLibraries Directives Languages “Drop-in” Easily Accelerate MaximumAcceleration Applications Flexibility THRUST C BLAS, LAPACK C++ FFT PGI Accelerator Fortran NPP CAPS HMPP OpenCL Sparse CRAY DirectCompute Imaging Java RNG Python
GPU-Accelerated MATLAB Results 10x speedup in data clustering via K- 14x speedup in template matching routine 3x speedup in estimating 7.6 million means clustering algorithm (part of cancer cell image analysis) contract prices using Black-Scholes model17x speedup in simulating the movement 4x speedup in adaptive filtering routine 4x speedup in wave equation solving (part of 3072 celestial objects (part of acoustic tracking algorithm) of seismic data processing algorithm)
AMBER 12 - Extreme Performance with K20 DHRF JAC 23K Atoms (NVE) Running AMBER 12 GPU Support Revision 12.1 SPFP with CUDA 4.2.9 ECC Off 120 The blue node contains 2x Intel E5-2687W CPUs 95.59 (8 Cores per CPU) 100 Each green node contains 2x Intel E5-2687W CPUs (8 Cores per CPU) plus 2x NVIDIA K20 GPUNanoseconds / Day 80 60 40 20 12.47 0 1 Node 1 Node DHFR Gain > 7.5X throughput/performance by adding just 2 K20 GPUs when compared to dual CPU performance
NAMD 2.9 Outstanding Strong Scaling with Multi-STMV Running NAMD version 2.9 Each blue XE6 CPU node contains 1x AMD 100 STMV on Hundreds of Nodes 1600 Opteron (16 Cores per CPU). 1.2 Fermi XK6 Each green XK6 CPU+GPU node contains 1x AMD 1600 Opteron (16 Cores per CPU) 1 and an additional 1x NVIDIA X2090 GPU. CPU XK6 2.7xNanoseconds / Day 0.8 2.9x 0.6 0.4 0.2 3.6x 3.8x Concatenation of 100 0 Satellite Tobacco Mosaic Virus 32 64 128 256 512 640 768 # of Nodes Accelerate your science by 2.7-3.8x when compared to CPU-based supercomputers
Try NVIDIA GPUs Available Applications Applications Catalog www.nvidia.com/appscatalogQuick Application Acceleration OpenACC Directives www.nvidia.com/gpudirectives Easy & Free GPU Test Drive GPU Test Drive Cluster www.nvidia.com/gputestdrive