Nvidia SC13 Podcast
Upcoming SlideShare
Loading in...5
×
 

Nvidia SC13 Podcast

on

  • 957 views

In this slidecast, Sumit Gupta from Nvidia discusses the latest product news on GPU computing for HPC. ...

In this slidecast, Sumit Gupta from Nvidia discusses the latest product news on GPU computing for HPC.

* IBM and NVIDIA Partner to Build Next-Generation Supercomputers
* NVIDIA Launches the Tesla K40 GPU Accelerator, their fastest accelerator ever

Learn more: http://nvidianews.nvidia.com/Releases/NVIDIA-Launches-World-s-Fastest-Accelerator-for-Supercomputing-and-Big-Data-Analytics-a66.aspx

Watch the video presentation: http://wp.me/p3RLHQ-aRY

Statistics

Views

Total Views
957
Views on SlideShare
956
Embed Views
1

Actions

Likes
0
Downloads
16
Comments
0

1 Embed 1

http://www.linkedin.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

CC Attribution License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Nvidia SC13 Podcast Nvidia SC13 Podcast Presentation Transcript

  • SUPERCOMPUTING 2013 PRESS DECK Sumit Gupta | General Manager, Tesla Accelerated Computing
  • SC13 News 1 IBM Taps GPU Accelerators 2 New Product Announcements 3 New Supercomputer Announcements
  • Accelerated Computing Growing Fast 2x Growth in One Year 50% Percent of HPC Systems With Accelerators 44% Hundreds of GPU Accelerated Apps 300 242 250 40% 200 30% 22% 24% 150 20% NVIDIA GPU is Accelerator of Choice INTEL PHI 4% OTHERS 11% 182 113 100 10% 50 0% 0 2010 2011 2012 Intersect360 Research HPC User Site Census: Systems, July 2013 NVIDIA GPUs 85% 2011 2012 2013 Intersect360 Research HPC User Site Census: Systems, July 2013 View slide
  • IBM Using GPUs to Accelerate Enterprise & Data Analytics Applications Application Infrastructure Business Intelligence Predictive Analytics Risk Analytics View slide
  • IBM Partners with NVIDIA to Build NextGeneration Supercomputers + Tesla GPU POWER8 CPU GPU-Accelerated POWER-Based Systems Available in 2014
  • GPU Computing in Data Centers Power ARM64 x86 x86 2007 2008 2009 2010 2011 2012 2013 2014
  • Linux GCC Compiler to Support GPU Accelerators Open Source OpenACC in GCC by Mentor Graphics & Samsung Pervasive Impact Free to all Linux users Mainstream Most Widely Used HPC Compiler “ Incorporating OpenACC into GCC is an excellent example of open source and open standards working together to make accelerated computing broadly accessible to all Linux developers. ” 7 OpenACC-standard.org confidential Oscar Hernandez Oak Ridge National Laboratory
  • SC13 News 1 IBM Taps GPU Accelerators 2 New Product Announcements 3 New Supercomputer Announcements
  • Tesla K40 World’s Fastest Accelerator for Supercomputing and Big Data Analytics CUDA 6 Dramatically Simplifies Parallel Programming with Unified Memory
  • Tesla K40 World’s Fastest Accelerator FASTER 1.4 TF| 2880 Cores | 288 GB/s ns/day 5 LARGER 2x Memory Enables More Apps AMBER Benchmark 4 SMARTER Unlock Extra Performance Using Power Headroom 6GB 3 2 Fluid Rendering Dynamics Seismic Analysis 1 0 CPU K20X K40 GPU Boost 12GB AMBER Benchmark: SPFP-Nucleosome CPU: Dual E5-2687W @ 3.10GHz, 64GB System Memory, CentOS 6.2, GPU systems: Single Tesla K20X or Single Tesla K40
  • GPU Boost Up to 25% Extra Performance on Applications Use Power Headroom to Run at Higher Clocks 1.40 25% Faster 1.20 20% Faster 14% Faster 17% Faster 1.00 0.80 13% Faster 0.60 0.40 0.20 11% Faster 0.00 AMBER SPFP-TRPCage Tesla K40 (base) LAMMPS-EAM NAMD 2.9-APOA1 Tesla K40 with GPU Boost
  • ANNOUNCING Unified Memory CUDA 6
  • Unified Memory Dramatically Lower Developer Effort Developer View Today System Memory GPU Memory Developer View With Unified Memory Unified Memory
  • Super Simplified Memory Management Code CPU Code void sortfile(FILE *fp, int N) { char *data; data = (char *)malloc(N); CUDA 6 Code with Unified Memory void sortfile(FILE *fp, int N) { char *data; cudaMallocManaged(&data, N); fread(data, 1, N, fp); qsort(data, N, 1, compare); qsort<<<...>>>(data,N,1,compare); cudaDeviceSynchronize(); use_data(data); use_data(data); free(data); } fread(data, 1, N, fp); cudaFree(data); }
  • SC13 News 1 IBM Taps GPU Accelerators 2 New Product Announcements 3 New Supercomputer Announcements
  • Fastest Supercomputer In Europe 6.27 PetaFLOPS (80% Linpack Efficiency) Piz Daint Greenest Petascale System 3110 MFLOPS/W #2: JUQUEEN: 2176 MFLOPS/W Production-Grade Weather Forecasts: COSMO 7 National Weather Agencies Germany | Greece | Italy | Poland | Russia | Romania | Switzerland
  • Greenest Supercomputer in the World Tokyo Tech KFC System 4000+ MFLOPS per Watt 25% Higher than #1 Green500 System 160 Tesla K20X GPUs Oil Immersion Technology Current Green500 #1: CINECA Eurora System, Italy, 3208 MF/W
  • ANSYS Fluent Doubles Performance with GPUs Automobile Drag Simulation Throughput 30 Number of Jobs per Day 25 90% Faster 20 15 2x 10 Better Insight for Low Drag Design 5 2% 0 CPU K40 2 x E5-2680 CPUs 8 cores used; 2 Tesla K40s Sedan Geometry, 3.6M mixed cells Steady, turbulent, external aerodynamics- Coupled PBNS, DP Solver 1.5B Less Drag Gal. of Fuel Saved/Year
  • SUPERCOMPUTING 2013 PRESS DECK Sumit Gupta | General Manager, Tesla Accelerated Computing
  • Additional Information
  • Tesla K40 20-40% Faster than K20X on Applications 1.5 1.4x K20X 1.3x 1.2x 1.3x K40 @ base 1.3x 1.3x K40 @ boost 1.3x 1.0 0.5 0.0 ANSYS 14 LAMMPS NAMD 2.9 AMBER LSMS QMCPACK SMP-V14sp-4 EAM APOA1 SPFP-Nucleosome Fe32 3x3x1 CUBLAS
  • First Tesla K40 Customers CSC Finland Texas Advanced Computing Center CEA France Swinburne Australia
  • Tesla K40 OEM Partners
  • K20X K40 Peak Single Precision Peak SGEMM 3.93 TF 2.95 TF 4.29 TF 3.22 TF Peak Double Precision Peak DGEMM 1.31 TF 1.22 TF 1.43 TF 1.33 TF Memory size 6 GB 12 GB Memory BW (ECC off) 250 GB/s 288 GB/s Memory Clock 2.6 GHz 3.0 GHz PCIe Gen Gen 2 Gen 3 # of Cores 2688 2880 Core Clock 732 MHz Base: 745 MHz Boost Clocks: 810 & 875 Mhz Total Board Power 235W 235W Form Factor PCIe Passive PCIe Passive, Active 9