Amd accelerated computing -ufrj

CPU GPU OpenCL DirectCompute Accelerated Computing Roberto Brandão AMD Latin America

Agenda X86 PROCESSOR EVOLUTION THE GPU AS AN ACCELERATOR ACCELERATED PROCESSING UNITS INTRODUCTION TO OpenCL

AMD architecture“Istambul” six-core diagram Chipset Balanced caches 2 3 4 5 6 1 Native six-core processor L2 L2 L2 L2 L2 L2 L3 Cache Lower memory latency CROSSBAR Memory Controller Hyper Transport HyperTransport Fast full-duplex bus PCI-e

4P/24-core system examplevery good scalability One memory controller for every processor Full-duplex Hyper Transport links (up to 5.2GHz) Bus Optimization: HT Assist (Cache Probe Filtering) Still the only available 4P system with Direct Connect Architecture MEMORY MEMORY MEMORY MEMORY

Direct Connect Architecture 1.0Balanced and Scalable Design to Support up to 6 Cores 2 MEMORY CHANNELS 2 MEMORY CHANNELS 8 DIMMs per CPU 8 DIMMs per CPU 2 MEMORY CHANNELS 2 MEMORY CHANNELS 8 DIMMs per CPU 8 DIMMs per CPU No front side bus HyperTransport™ technology Integrated memory controller NUMA memory architecture

Direct Connect Architecture 2.0Balanced and Scalable Design to Support up to 16 Cores* per CPU 4 MEMORY CHANNELS 4 MEMORY CHANNELS 12 DIMMs per CPU 12 DIMMs per CPU 4 MEMORY CHANNELS 4 MEMORY CHANNELS 12 DIMMs per CPU 12 DIMMs per CPU ,[object Object]

Up to 33% increase in CPU to CPU communication speed±,[object Object]

Improved IPC (8 per cycle is a target)

Top500 list - beyond the petaflop Datacenters in the USA will spend more than $3 billion on energy in 2009

1997: X Garry Kasparov IBM Deep Blue

The World’s Most Powerful GPU = 177x IBM Deep Blue

2011 GPU Architecture AMD Radeon™ HD 6900 Series Dual graphics engines New VLIW4 core architecture Up to 24 SIMD engines Up to 96 Texture Units Upgraded render back-ends Improved anti-aliasing performance Fast 256-bit GDDR5 memory interface Up to 5.5 Gbps New GPU compute features

Designing very efficient GPUsFull load: 180W; Idle:27W 14.47 GFLOPS/W GFLOPS/W GFLOPS/mm2 7.50 7.90 GFLOPS/mm2 4.50 2.21 2.01 4.56 2.24 1.07 1.06 0.92 0.42

Old and New in High Performance Computing Old: Power is free, Transistors are expensive New: Power expensive, Transistors free (Can put more transistors on chip than can afford to turn on) Old: Multiplies are slow, Memory access is fast New: Multiplies fast, Memory slow (up 200 clocks to DRAM memory, 4 clocks for FP multiply) Old: Increasing Instruction Level Parallelism via compilers innovation New: Explicit thread and data parallelism must be exploited

GPUs: more than just gaming 15 2700 Both use GPUs Oil exploration platform - 2010 Wii Sports - Golf

DirectX® 11 Multi-Threading ,[object Object]

Tasks like loading a texture or compiling a shader can execute in parallel with main rendering threadDirectX® 10 DirectX® 11 16

Today’s GPUs focused on GAMING ENTERTAINMENT PRODUCTIVITY

DirectX® 11 Tessellation DirectX® 10 DirectX® 11 No Tessellation Tessellation Images courtesy of Unigine Corp. 18

Research companies already using 21 Oil exploration Nature simulation Wheather forecast Fluid Dynamics

AMD Balanced Platform GPU is ideal for data parallel algorithms like image processing, CAE, etc ,[object Object]

Great use for additional GPUsCPU is excellent for running some algorithms ,[object Object]

Great use for additional CPU coresGraphics Workloads Other Highly Parallel Workloads Serial/Task-Parallel Workloads Delivers optimal performance for a wide range of platform configurations

ATI Stream Technology is… Heterogeneous: Developers leverage AMD GPUs and x86 CPUs for optimal application performance and user experience High performance:Massively parallel, programmable GPU architecture delivers unprecedented performance and power efficiency Industry Standards:OpenCL™ and DirectCompute 11 enable cross-platform development Engineering Sciences Government Gaming Digital Content Creation Productivity

Improvements already reached consumers ATI Stream Processor utilization Adobe Flash plugin used by Youtube.com ,[object Object]

Lower processor usage,[object Object]

Video Transcoding SampleNo GPU Acceleration CPU Usage: 100% Frames Frames Using four CPU Cores GPU Usage: 1% 26

Video Transcoding SampleATI GPU Acceleration CPU Usage: 45% Control Control Frames Frames GPU Usage: 35% Using hundreds of Stream Processors 27

Today TeraFLOPS-class GPU Multi-core CPU ~800 million transistors Multi-tasking Up to 2 billion transistors Jogosemmultiplosmonitores Video e audio Full HD

A new Era on performance evolution Multi-Core Heterogeneous computing Single-Core Challenge: Power consumption Software Challenge: Power consumption Complexity Pros: ,[object Object]

Power efficientCons: Software availability ? Single-thread We are here Performance Performance We are here We are here Time x Cores Time Time

A new Era on performance evolution Multi-Core Single-Core CPU Core efficiency Software Acceleration Low power consumption Multimedia Gaming GPU

Putting all together – The Future is Fusion RingStop Client Interface Client Interface Client Interface Client Interface Write Crossbar Switch Memory Controller RingStop RingStop Chipset Client Interface Client Interface Client Interface Client Interface RingStop RV500 GPU Core (2006) AMD “Istambul” six-core processor 2 3 4 5 6 1 L2 L2 L2 L2 L2 L2 Cache L3 CROSSBAR Memory Controller Hyper Transport HyperTransport PCI-e

Putting all together – The Future is Fusion Chipset RV700 GPU Core (2008-2009) AMD “Istambul” six-core processor 2 3 4 5 6 1 L2 L2 L2 L2 L2 L2 Cache L3 CROSSBAR Memory Controller Hyper Transport HyperTransport PCI-e

Putting all together – The Future is Fusion RV700 GPU Core AMD “Istambul” six-core processor CROSSBAR CROSSBAR

2011: welcome to the APU time! APU GPU CPU “Supercomputing power in a notebook platform whose battery lasts for a full day”

One Design, Fewer Watts, Massive Capability “Zacate” AMD Fusion APU Discrete-level DirectX® 11 GPU Dual-Core CPU + + = Northbridge ,[object Object]

Amd accelerated computing -ufrj

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Amd accelerated computing -ufrj

Similar to Amd accelerated computing -ufrj (20)

More from Roberto Brandao

More from Roberto Brandao (10)

Recently uploaded

Recently uploaded (20)

Amd accelerated computing -ufrj

Editor's Notes