SlideShare a Scribd company logo
1 of 48
CPU GPU OpenCL DirectCompute Accelerated Computing Roberto Brandão AMD Latin America
Agenda X86 PROCESSOR EVOLUTION THE GPU AS AN ACCELERATOR ACCELERATED PROCESSING UNITS INTRODUCTION TO OpenCL
Evolving x86 Processors
AMD architecture“Istambul” six-core diagram Chipset Balanced caches 2 3 4 5 6 1 Native  six-core  processor L2 L2 L2 L2 L2 L2 L3 Cache  Lower memory  latency CROSSBAR Memory  Controller Hyper  Transport HyperTransport Fast full-duplex bus PCI-e
4P/24-core system examplevery good scalability One memory controller for every processor Full-duplex Hyper Transport links (up to 5.2GHz) Bus Optimization: HT Assist (Cache Probe Filtering) Still the only available 4P system with Direct Connect Architecture MEMORY MEMORY MEMORY MEMORY
Direct Connect Architecture 1.0Balanced and Scalable Design to Support up to 6 Cores 2 MEMORY  CHANNELS 2 MEMORY  CHANNELS 8 DIMMs per CPU 8 DIMMs per CPU 2 MEMORY  CHANNELS 2 MEMORY  CHANNELS 8 DIMMs per CPU 8 DIMMs per CPU No front side bus HyperTransport™ technology Integrated memory controller NUMA memory architecture
Direct Connect Architecture 2.0Balanced and Scalable Design to Support up to 16 Cores* per CPU  4 MEMORY  CHANNELS 4 MEMORY  CHANNELS 12 DIMMs per CPU 12 DIMMs per CPU 4 MEMORY  CHANNELS 4 MEMORY  CHANNELS 12 DIMMs per CPU 12 DIMMs per CPU ,[object Object]
Four memory channels
Up to 50% more DIMMs
Up to 33% increase in CPU to CPU communication speed±,[object Object]
Improved IPC (8 per cycle is a target)
Top500 list - beyond the petaflop Datacenters in the USA will spend more than $3 billion on energy in 2009
1997: X  Garry Kasparov          IBM Deep Blue
The World’s Most Powerful GPU = 177x  IBM Deep Blue
2011 GPU Architecture AMD Radeon™ HD 6900 Series Dual graphics engines New VLIW4 core architecture Up to 24 SIMD engines Up to 96 Texture Units Upgraded render back-ends Improved anti-aliasing performance Fast 256-bit GDDR5 memory interface Up to 5.5 Gbps New GPU compute features
Designing very efficient GPUsFull load: 180W; Idle:27W 14.47 GFLOPS/W GFLOPS/W GFLOPS/mm2 7.50 7.90 GFLOPS/mm2 4.50 2.21 2.01 4.56 2.24 1.07 1.06 0.92 0.42
Old and New in High Performance Computing Old: Power is free, Transistors are expensive New: Power expensive, Transistors free (Can put more transistors on chip than can afford to turn on) Old: Multiplies are slow, Memory access is fast New: Multiplies fast, Memory slow (up 200 clocks to DRAM memory, 4 clocks for FP multiply) Old: Increasing Instruction Level Parallelism via compilers innovation New: Explicit thread and data parallelism must be exploited
GPUs: more than just gaming 15 2700 Both use GPUs Oil exploration platform - 2010 Wii Sports - Golf
DirectX® 11 Multi-Threading ,[object Object]
Tasks like loading a texture or compiling a shader can execute in parallel with main rendering threadDirectX® 10 DirectX® 11 16
Today’s GPUs focused on GAMING ENTERTAINMENT PRODUCTIVITY
DirectX® 11 Tessellation DirectX® 10 DirectX® 11 No Tessellation Tessellation Images courtesy of Unigine Corp. 18
5/25/2011
5/25/2011
Research companies already using 21 Oil exploration Nature simulation Wheather forecast Fluid Dynamics
AMD Balanced Platform GPU is ideal for data parallel algorithms like image processing, CAE, etc ,[object Object]
Great use for additional GPUsCPU is excellent for running some algorithms ,[object Object]
Great use for additional CPU coresGraphics Workloads Other Highly Parallel Workloads Serial/Task-Parallel Workloads Delivers optimal performance for a wide range of platform configurations
ATI Stream Technology is… Heterogeneous: Developers leverage AMD GPUs and x86 CPUs for optimal application performance and user experience High performance:Massively parallel, programmable GPU architecture delivers unprecedented performance and power efficiency Industry Standards:OpenCL™ and DirectCompute 11 enable cross-platform development  Engineering Sciences Government Gaming Digital Content Creation Productivity
Improvements already reached consumers ATI  Stream Processor utilization Adobe Flash plugin used by Youtube.com ,[object Object]
 Lower processor usage,[object Object]
Video Transcoding SampleNo GPU Acceleration CPU Usage: 100% Frames Frames Using four CPU Cores GPU Usage: 1% 26
Video Transcoding SampleATI GPU Acceleration CPU Usage: 45% Control Control Frames Frames GPU Usage: 35% Using hundreds of Stream Processors 27
FUSION TECHNOLOGY
Today TeraFLOPS-class GPU Multi-core CPU ~800 million transistors Multi-tasking Up to 2 billion transistors Jogosemmultiplosmonitores Video e audio Full HD
A new Era on performance evolution Multi-Core Heterogeneous  computing Single-Core Challenge: Power consumption Software Challenge: Power consumption Complexity Pros: ,[object Object]
Power efficientCons: Software availability ? Single-thread We are here Performance Performance We are here We are here Time x Cores Time Time
A new Era on performance evolution Multi-Core Single-Core CPU Core efficiency  Software  Acceleration Low power consumption Multimedia Gaming GPU
Putting all together – The Future is Fusion RingStop Client Interface Client Interface Client Interface Client Interface Write Crossbar Switch Memory Controller RingStop RingStop Chipset Client Interface Client Interface Client Interface Client Interface RingStop RV500 GPU Core (2006) AMD “Istambul” six-core processor 2 3 4 5 6 1 L2 L2 L2 L2 L2 L2 Cache L3 CROSSBAR Memory  Controller Hyper  Transport HyperTransport PCI-e
Putting all together – The Future is Fusion Chipset RV700 GPU Core (2008-2009) AMD “Istambul” six-core processor 2 3 4 5 6 1 L2 L2 L2 L2 L2 L2 Cache L3 CROSSBAR Memory  Controller Hyper  Transport HyperTransport PCI-e
Putting all together – The Future is Fusion RV700 GPU Core AMD “Istambul” six-core processor CROSSBAR CROSSBAR
2011: welcome to the APU time! APU GPU CPU “Supercomputing power in a notebook platform whose battery lasts for a full day”
One Design, Fewer Watts, Massive Capability  “Zacate” AMD Fusion APU  Discrete-level DirectX® 11 GPU  Dual-Core CPU + + = Northbridge ,[object Object]
18 watts
59 sq. mm
8 watts
66 sq. mm
13 watts

More Related Content

What's hot

Delivering a new level of visual performance in an SoC AMD "Raven Ridge" APU
Delivering a new level of visual performance in an SoC AMD "Raven Ridge" APUDelivering a new level of visual performance in an SoC AMD "Raven Ridge" APU
Delivering a new level of visual performance in an SoC AMD "Raven Ridge" APUAMD
 
Open compute technology
Open compute technologyOpen compute technology
Open compute technologyAMD
 
CC-4009, "Optimizing Hadoop Deployments with SeaMicro SM15000" by Satheesh Na...
CC-4009, "Optimizing Hadoop Deployments with SeaMicro SM15000" by Satheesh Na...CC-4009, "Optimizing Hadoop Deployments with SeaMicro SM15000" by Satheesh Na...
CC-4009, "Optimizing Hadoop Deployments with SeaMicro SM15000" by Satheesh Na...AMD Developer Central
 
AMD 2014 A Series and Performance Mobile Accelerated Processing Units (Codena...
AMD 2014 A Series and Performance Mobile Accelerated Processing Units (Codena...AMD 2014 A Series and Performance Mobile Accelerated Processing Units (Codena...
AMD 2014 A Series and Performance Mobile Accelerated Processing Units (Codena...AMD
 
ISSCC 2018: "Zeppelin": an SoC for Multi-chip Architectures
ISSCC 2018: "Zeppelin": an SoC for Multi-chip ArchitecturesISSCC 2018: "Zeppelin": an SoC for Multi-chip Architectures
ISSCC 2018: "Zeppelin": an SoC for Multi-chip ArchitecturesAMD
 
Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core
Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor CoreZen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core
Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor CoreAMD
 
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...AMD Developer Central
 
AMD Bridges the X86 and ARM Ecosystems for the Data Center
AMD Bridges the X86 and ARM Ecosystems for the Data Center AMD Bridges the X86 and ARM Ecosystems for the Data Center
AMD Bridges the X86 and ARM Ecosystems for the Data Center AMD
 
AMD Ryzen Mobile with Radeon Vega Graphics
AMD Ryzen Mobile with Radeon Vega Graphics AMD Ryzen Mobile with Radeon Vega Graphics
AMD Ryzen Mobile with Radeon Vega Graphics Low Hong Chuan
 
HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...
HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...
HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...AMD Developer Central
 
Heterogeneous Systems Architecture: The Next Area of Computing Innovation
Heterogeneous Systems Architecture: The Next Area of Computing Innovation Heterogeneous Systems Architecture: The Next Area of Computing Innovation
Heterogeneous Systems Architecture: The Next Area of Computing Innovation AMD
 
AMD EPYC 7002 World Records
AMD EPYC 7002 World RecordsAMD EPYC 7002 World Records
AMD EPYC 7002 World RecordsAMD
 
LCU13: GPGPU on ARM Experience Report
LCU13: GPGPU on ARM Experience ReportLCU13: GPGPU on ARM Experience Report
LCU13: GPGPU on ARM Experience ReportLinaro
 
Snapdragon s4 processors system on chip solutions for a new mobile age
Snapdragon s4 processors system on chip solutions for a new mobile ageSnapdragon s4 processors system on chip solutions for a new mobile age
Snapdragon s4 processors system on chip solutions for a new mobile ageSatya Harish
 
LCE13: Android Graphics Upstreaming
LCE13: Android Graphics UpstreamingLCE13: Android Graphics Upstreaming
LCE13: Android Graphics UpstreamingLinaro
 
MM-4085, Designing a game audio engine for HSA, by Laurent Betbeder
MM-4085, Designing a game audio engine for HSA, by Laurent BetbederMM-4085, Designing a game audio engine for HSA, by Laurent Betbeder
MM-4085, Designing a game audio engine for HSA, by Laurent BetbederAMD Developer Central
 
Power 7 Overview
Power 7 OverviewPower 7 Overview
Power 7 Overviewlambertt
 
CC-4006, Deliver Hardware Accelerated Applications Using RemoteFX vGPU with W...
CC-4006, Deliver Hardware Accelerated Applications Using RemoteFX vGPU with W...CC-4006, Deliver Hardware Accelerated Applications Using RemoteFX vGPU with W...
CC-4006, Deliver Hardware Accelerated Applications Using RemoteFX vGPU with W...AMD Developer Central
 

What's hot (20)

Delivering a new level of visual performance in an SoC AMD "Raven Ridge" APU
Delivering a new level of visual performance in an SoC AMD "Raven Ridge" APUDelivering a new level of visual performance in an SoC AMD "Raven Ridge" APU
Delivering a new level of visual performance in an SoC AMD "Raven Ridge" APU
 
Infrastructure et serveurs HP
Infrastructure et serveurs HPInfrastructure et serveurs HP
Infrastructure et serveurs HP
 
NWU and HPC
NWU and HPCNWU and HPC
NWU and HPC
 
Open compute technology
Open compute technologyOpen compute technology
Open compute technology
 
CC-4009, "Optimizing Hadoop Deployments with SeaMicro SM15000" by Satheesh Na...
CC-4009, "Optimizing Hadoop Deployments with SeaMicro SM15000" by Satheesh Na...CC-4009, "Optimizing Hadoop Deployments with SeaMicro SM15000" by Satheesh Na...
CC-4009, "Optimizing Hadoop Deployments with SeaMicro SM15000" by Satheesh Na...
 
AMD 2014 A Series and Performance Mobile Accelerated Processing Units (Codena...
AMD 2014 A Series and Performance Mobile Accelerated Processing Units (Codena...AMD 2014 A Series and Performance Mobile Accelerated Processing Units (Codena...
AMD 2014 A Series and Performance Mobile Accelerated Processing Units (Codena...
 
ISSCC 2018: "Zeppelin": an SoC for Multi-chip Architectures
ISSCC 2018: "Zeppelin": an SoC for Multi-chip ArchitecturesISSCC 2018: "Zeppelin": an SoC for Multi-chip Architectures
ISSCC 2018: "Zeppelin": an SoC for Multi-chip Architectures
 
Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core
Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor CoreZen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core
Zen 2: The AMD 7nm Energy-Efficient High-Performance x86-64 Microprocessor Core
 
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...
 
AMD Bridges the X86 and ARM Ecosystems for the Data Center
AMD Bridges the X86 and ARM Ecosystems for the Data Center AMD Bridges the X86 and ARM Ecosystems for the Data Center
AMD Bridges the X86 and ARM Ecosystems for the Data Center
 
AMD Ryzen Mobile with Radeon Vega Graphics
AMD Ryzen Mobile with Radeon Vega Graphics AMD Ryzen Mobile with Radeon Vega Graphics
AMD Ryzen Mobile with Radeon Vega Graphics
 
HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...
HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...
HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...
 
Heterogeneous Systems Architecture: The Next Area of Computing Innovation
Heterogeneous Systems Architecture: The Next Area of Computing Innovation Heterogeneous Systems Architecture: The Next Area of Computing Innovation
Heterogeneous Systems Architecture: The Next Area of Computing Innovation
 
AMD EPYC 7002 World Records
AMD EPYC 7002 World RecordsAMD EPYC 7002 World Records
AMD EPYC 7002 World Records
 
LCU13: GPGPU on ARM Experience Report
LCU13: GPGPU on ARM Experience ReportLCU13: GPGPU on ARM Experience Report
LCU13: GPGPU on ARM Experience Report
 
Snapdragon s4 processors system on chip solutions for a new mobile age
Snapdragon s4 processors system on chip solutions for a new mobile ageSnapdragon s4 processors system on chip solutions for a new mobile age
Snapdragon s4 processors system on chip solutions for a new mobile age
 
LCE13: Android Graphics Upstreaming
LCE13: Android Graphics UpstreamingLCE13: Android Graphics Upstreaming
LCE13: Android Graphics Upstreaming
 
MM-4085, Designing a game audio engine for HSA, by Laurent Betbeder
MM-4085, Designing a game audio engine for HSA, by Laurent BetbederMM-4085, Designing a game audio engine for HSA, by Laurent Betbeder
MM-4085, Designing a game audio engine for HSA, by Laurent Betbeder
 
Power 7 Overview
Power 7 OverviewPower 7 Overview
Power 7 Overview
 
CC-4006, Deliver Hardware Accelerated Applications Using RemoteFX vGPU with W...
CC-4006, Deliver Hardware Accelerated Applications Using RemoteFX vGPU with W...CC-4006, Deliver Hardware Accelerated Applications Using RemoteFX vGPU with W...
CC-4006, Deliver Hardware Accelerated Applications Using RemoteFX vGPU with W...
 

Similar to GPU and CPU Accelerated Computing

Heterogeneous Computing : The Future of Systems
Heterogeneous Computing : The Future of SystemsHeterogeneous Computing : The Future of Systems
Heterogeneous Computing : The Future of SystemsAnand Haridass
 
Supermicro’s Universal GPU: Modular, Standards Based and Built for the Future
Supermicro’s Universal GPU: Modular, Standards Based and Built for the FutureSupermicro’s Universal GPU: Modular, Standards Based and Built for the Future
Supermicro’s Universal GPU: Modular, Standards Based and Built for the FutureRebekah Rodriguez
 
POWER9 AC922 Newell System - HPC & AI
POWER9 AC922 Newell System - HPC & AI POWER9 AC922 Newell System - HPC & AI
POWER9 AC922 Newell System - HPC & AI Anand Haridass
 
Modular by Design: Supermicro’s New Standards-Based Universal GPU Server
Modular by Design: Supermicro’s New Standards-Based Universal GPU ServerModular by Design: Supermicro’s New Standards-Based Universal GPU Server
Modular by Design: Supermicro’s New Standards-Based Universal GPU ServerRebekah Rodriguez
 
NVIDIA DGX User Group 1st Meet Up_30 Apr 2021.pdf
NVIDIA DGX User Group 1st Meet Up_30 Apr 2021.pdfNVIDIA DGX User Group 1st Meet Up_30 Apr 2021.pdf
NVIDIA DGX User Group 1st Meet Up_30 Apr 2021.pdfMuhammadAbdullah311866
 
Stream Processing
Stream ProcessingStream Processing
Stream Processingarnamoy10
 
Modular by Design: Supermicro’s New Standards-Based Universal GPU Server
Modular by Design: Supermicro’s New Standards-Based Universal GPU ServerModular by Design: Supermicro’s New Standards-Based Universal GPU Server
Modular by Design: Supermicro’s New Standards-Based Universal GPU ServerRebekah Rodriguez
 
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)Lablup Inc.
 
HPC Infrastructure To Solve The CFD Grand Challenge
HPC Infrastructure To Solve The CFD Grand ChallengeHPC Infrastructure To Solve The CFD Grand Challenge
HPC Infrastructure To Solve The CFD Grand ChallengeAnand Haridass
 
The Power of One: Supermicro’s High-Performance Single-Processor Blade Systems
The Power of One: Supermicro’s High-Performance Single-Processor Blade SystemsThe Power of One: Supermicro’s High-Performance Single-Processor Blade Systems
The Power of One: Supermicro’s High-Performance Single-Processor Blade SystemsRebekah Rodriguez
 
Intel 8th Core G Series with Radeon Vega M
Intel 8th Core G Series with Radeon Vega M Intel 8th Core G Series with Radeon Vega M
Intel 8th Core G Series with Radeon Vega M Low Hong Chuan
 
Webinar: NVIDIA JETSON – A Inteligência Artificial na palma de sua mão
Webinar: NVIDIA JETSON – A Inteligência Artificial na palma de sua mãoWebinar: NVIDIA JETSON – A Inteligência Artificial na palma de sua mão
Webinar: NVIDIA JETSON – A Inteligência Artificial na palma de sua mãoEmbarcados
 
4 p9 architecture overview japan meetup
4 p9 architecture overview japan meetup4 p9 architecture overview japan meetup
4 p9 architecture overview japan meetupYutaka Kawai
 
Hot Chips: AMD Next Gen 7nm Ryzen 4000 APU
Hot Chips: AMD Next Gen 7nm Ryzen 4000 APUHot Chips: AMD Next Gen 7nm Ryzen 4000 APU
Hot Chips: AMD Next Gen 7nm Ryzen 4000 APUAMD
 

Similar to GPU and CPU Accelerated Computing (20)

Heterogeneous Computing : The Future of Systems
Heterogeneous Computing : The Future of SystemsHeterogeneous Computing : The Future of Systems
Heterogeneous Computing : The Future of Systems
 
Supermicro’s Universal GPU: Modular, Standards Based and Built for the Future
Supermicro’s Universal GPU: Modular, Standards Based and Built for the FutureSupermicro’s Universal GPU: Modular, Standards Based and Built for the Future
Supermicro’s Universal GPU: Modular, Standards Based and Built for the Future
 
POWER9 AC922 Newell System - HPC & AI
POWER9 AC922 Newell System - HPC & AI POWER9 AC922 Newell System - HPC & AI
POWER9 AC922 Newell System - HPC & AI
 
Modular by Design: Supermicro’s New Standards-Based Universal GPU Server
Modular by Design: Supermicro’s New Standards-Based Universal GPU ServerModular by Design: Supermicro’s New Standards-Based Universal GPU Server
Modular by Design: Supermicro’s New Standards-Based Universal GPU Server
 
NVIDIA DGX User Group 1st Meet Up_30 Apr 2021.pdf
NVIDIA DGX User Group 1st Meet Up_30 Apr 2021.pdfNVIDIA DGX User Group 1st Meet Up_30 Apr 2021.pdf
NVIDIA DGX User Group 1st Meet Up_30 Apr 2021.pdf
 
Stream Processing
Stream ProcessingStream Processing
Stream Processing
 
Modular by Design: Supermicro’s New Standards-Based Universal GPU Server
Modular by Design: Supermicro’s New Standards-Based Universal GPU ServerModular by Design: Supermicro’s New Standards-Based Universal GPU Server
Modular by Design: Supermicro’s New Standards-Based Universal GPU Server
 
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
 
Ac922 cdac webinar
Ac922 cdac webinarAc922 cdac webinar
Ac922 cdac webinar
 
HPC Infrastructure To Solve The CFD Grand Challenge
HPC Infrastructure To Solve The CFD Grand ChallengeHPC Infrastructure To Solve The CFD Grand Challenge
HPC Infrastructure To Solve The CFD Grand Challenge
 
The Power of One: Supermicro’s High-Performance Single-Processor Blade Systems
The Power of One: Supermicro’s High-Performance Single-Processor Blade SystemsThe Power of One: Supermicro’s High-Performance Single-Processor Blade Systems
The Power of One: Supermicro’s High-Performance Single-Processor Blade Systems
 
Intel 8th Core G Series with Radeon Vega M
Intel 8th Core G Series with Radeon Vega M Intel 8th Core G Series with Radeon Vega M
Intel 8th Core G Series with Radeon Vega M
 
POWER9 for AI & HPC
POWER9 for AI & HPCPOWER9 for AI & HPC
POWER9 for AI & HPC
 
APU in nepal 2
APU in nepal 2APU in nepal 2
APU in nepal 2
 
The Cell Processor
The Cell ProcessorThe Cell Processor
The Cell Processor
 
Webinar: NVIDIA JETSON – A Inteligência Artificial na palma de sua mão
Webinar: NVIDIA JETSON – A Inteligência Artificial na palma de sua mãoWebinar: NVIDIA JETSON – A Inteligência Artificial na palma de sua mão
Webinar: NVIDIA JETSON – A Inteligência Artificial na palma de sua mão
 
4 p9 architecture overview japan meetup
4 p9 architecture overview japan meetup4 p9 architecture overview japan meetup
4 p9 architecture overview japan meetup
 
Power overview 2018 08-13b
Power overview 2018 08-13bPower overview 2018 08-13b
Power overview 2018 08-13b
 
Hot Chips: AMD Next Gen 7nm Ryzen 4000 APU
Hot Chips: AMD Next Gen 7nm Ryzen 4000 APUHot Chips: AMD Next Gen 7nm Ryzen 4000 APU
Hot Chips: AMD Next Gen 7nm Ryzen 4000 APU
 
Ces08
Ces08Ces08
Ces08
 

More from Roberto Brandao

Apresentacao + Demo Brazos
Apresentacao + Demo BrazosApresentacao + Demo Brazos
Apresentacao + Demo BrazosRoberto Brandao
 
Amd accelerated computing -ufrj
Amd   accelerated computing -ufrjAmd   accelerated computing -ufrj
Amd accelerated computing -ufrjRoberto Brandao
 
Webseminario AMD phenom II x6
Webseminario AMD phenom II x6Webseminario AMD phenom II x6
Webseminario AMD phenom II x6Roberto Brandao
 
Atualização Canal Phenom I I X2 7000 Outras C P Us Dragon
Atualização  Canal    Phenom  I I    X2 7000    Outras  C P Us    DragonAtualização  Canal    Phenom  I I    X2 7000    Outras  C P Us    Dragon
Atualização Canal Phenom I I X2 7000 Outras C P Us DragonRoberto Brandao
 
Chipsets Amd Webseminario
Chipsets Amd WebseminarioChipsets Amd Webseminario
Chipsets Amd WebseminarioRoberto Brandao
 
AtualizaçãO Desktops Mobile Para Consumer
AtualizaçãO Desktops Mobile Para ConsumerAtualizaçãO Desktops Mobile Para Consumer
AtualizaçãO Desktops Mobile Para ConsumerRoberto Brandao
 

More from Roberto Brandao (10)

Apresentacao + Demo Brazos
Apresentacao + Demo BrazosApresentacao + Demo Brazos
Apresentacao + Demo Brazos
 
Amd accelerated computing -ufrj
Amd   accelerated computing -ufrjAmd   accelerated computing -ufrj
Amd accelerated computing -ufrj
 
Webseminario AMD phenom II x6
Webseminario AMD phenom II x6Webseminario AMD phenom II x6
Webseminario AMD phenom II x6
 
Web Seminario Athlon Ii
Web Seminario Athlon IiWeb Seminario Athlon Ii
Web Seminario Athlon Ii
 
Atualização Canal Phenom I I X2 7000 Outras C P Us Dragon
Atualização  Canal    Phenom  I I    X2 7000    Outras  C P Us    DragonAtualização  Canal    Phenom  I I    X2 7000    Outras  C P Us    Dragon
Atualização Canal Phenom I I X2 7000 Outras C P Us Dragon
 
AMD Green
AMD GreenAMD Green
AMD Green
 
Chipsets Amd Webseminario
Chipsets Amd WebseminarioChipsets Amd Webseminario
Chipsets Amd Webseminario
 
Web Seminario Phenom X3
Web Seminario Phenom X3Web Seminario Phenom X3
Web Seminario Phenom X3
 
AtualizaçãO Desktops Mobile Para Consumer
AtualizaçãO Desktops Mobile Para ConsumerAtualizaçãO Desktops Mobile Para Consumer
AtualizaçãO Desktops Mobile Para Consumer
 
Roadshow Canal AMD
Roadshow Canal AMDRoadshow Canal AMD
Roadshow Canal AMD
 

Recently uploaded

Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 

Recently uploaded (20)

Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 

GPU and CPU Accelerated Computing

  • 1. CPU GPU OpenCL DirectCompute Accelerated Computing Roberto Brandão AMD Latin America
  • 2. Agenda X86 PROCESSOR EVOLUTION THE GPU AS AN ACCELERATOR ACCELERATED PROCESSING UNITS INTRODUCTION TO OpenCL
  • 4. AMD architecture“Istambul” six-core diagram Chipset Balanced caches 2 3 4 5 6 1 Native six-core processor L2 L2 L2 L2 L2 L2 L3 Cache Lower memory latency CROSSBAR Memory Controller Hyper Transport HyperTransport Fast full-duplex bus PCI-e
  • 5. 4P/24-core system examplevery good scalability One memory controller for every processor Full-duplex Hyper Transport links (up to 5.2GHz) Bus Optimization: HT Assist (Cache Probe Filtering) Still the only available 4P system with Direct Connect Architecture MEMORY MEMORY MEMORY MEMORY
  • 6. Direct Connect Architecture 1.0Balanced and Scalable Design to Support up to 6 Cores 2 MEMORY CHANNELS 2 MEMORY CHANNELS 8 DIMMs per CPU 8 DIMMs per CPU 2 MEMORY CHANNELS 2 MEMORY CHANNELS 8 DIMMs per CPU 8 DIMMs per CPU No front side bus HyperTransport™ technology Integrated memory controller NUMA memory architecture
  • 7.
  • 9. Up to 50% more DIMMs
  • 10.
  • 11. Improved IPC (8 per cycle is a target)
  • 12. Top500 list - beyond the petaflop Datacenters in the USA will spend more than $3 billion on energy in 2009
  • 13. 1997: X Garry Kasparov IBM Deep Blue
  • 14. The World’s Most Powerful GPU = 177x IBM Deep Blue
  • 15. 2011 GPU Architecture AMD Radeon™ HD 6900 Series Dual graphics engines New VLIW4 core architecture Up to 24 SIMD engines Up to 96 Texture Units Upgraded render back-ends Improved anti-aliasing performance Fast 256-bit GDDR5 memory interface Up to 5.5 Gbps New GPU compute features
  • 16. Designing very efficient GPUsFull load: 180W; Idle:27W 14.47 GFLOPS/W GFLOPS/W GFLOPS/mm2 7.50 7.90 GFLOPS/mm2 4.50 2.21 2.01 4.56 2.24 1.07 1.06 0.92 0.42
  • 17. Old and New in High Performance Computing Old: Power is free, Transistors are expensive New: Power expensive, Transistors free (Can put more transistors on chip than can afford to turn on) Old: Multiplies are slow, Memory access is fast New: Multiplies fast, Memory slow (up 200 clocks to DRAM memory, 4 clocks for FP multiply) Old: Increasing Instruction Level Parallelism via compilers innovation New: Explicit thread and data parallelism must be exploited
  • 18. GPUs: more than just gaming 15 2700 Both use GPUs Oil exploration platform - 2010 Wii Sports - Golf
  • 19.
  • 20. Tasks like loading a texture or compiling a shader can execute in parallel with main rendering threadDirectX® 10 DirectX® 11 16
  • 21. Today’s GPUs focused on GAMING ENTERTAINMENT PRODUCTIVITY
  • 22. DirectX® 11 Tessellation DirectX® 10 DirectX® 11 No Tessellation Tessellation Images courtesy of Unigine Corp. 18
  • 25. Research companies already using 21 Oil exploration Nature simulation Wheather forecast Fluid Dynamics
  • 26.
  • 27.
  • 28. Great use for additional CPU coresGraphics Workloads Other Highly Parallel Workloads Serial/Task-Parallel Workloads Delivers optimal performance for a wide range of platform configurations
  • 29. ATI Stream Technology is… Heterogeneous: Developers leverage AMD GPUs and x86 CPUs for optimal application performance and user experience High performance:Massively parallel, programmable GPU architecture delivers unprecedented performance and power efficiency Industry Standards:OpenCL™ and DirectCompute 11 enable cross-platform development Engineering Sciences Government Gaming Digital Content Creation Productivity
  • 30.
  • 31.
  • 32. Video Transcoding SampleNo GPU Acceleration CPU Usage: 100% Frames Frames Using four CPU Cores GPU Usage: 1% 26
  • 33. Video Transcoding SampleATI GPU Acceleration CPU Usage: 45% Control Control Frames Frames GPU Usage: 35% Using hundreds of Stream Processors 27
  • 35. Today TeraFLOPS-class GPU Multi-core CPU ~800 million transistors Multi-tasking Up to 2 billion transistors Jogosemmultiplosmonitores Video e audio Full HD
  • 36.
  • 37. Power efficientCons: Software availability ? Single-thread We are here Performance Performance We are here We are here Time x Cores Time Time
  • 38. A new Era on performance evolution Multi-Core Single-Core CPU Core efficiency Software Acceleration Low power consumption Multimedia Gaming GPU
  • 39. Putting all together – The Future is Fusion RingStop Client Interface Client Interface Client Interface Client Interface Write Crossbar Switch Memory Controller RingStop RingStop Chipset Client Interface Client Interface Client Interface Client Interface RingStop RV500 GPU Core (2006) AMD “Istambul” six-core processor 2 3 4 5 6 1 L2 L2 L2 L2 L2 L2 Cache L3 CROSSBAR Memory Controller Hyper Transport HyperTransport PCI-e
  • 40. Putting all together – The Future is Fusion Chipset RV700 GPU Core (2008-2009) AMD “Istambul” six-core processor 2 3 4 5 6 1 L2 L2 L2 L2 L2 L2 Cache L3 CROSSBAR Memory Controller Hyper Transport HyperTransport PCI-e
  • 41. Putting all together – The Future is Fusion RV700 GPU Core AMD “Istambul” six-core processor CROSSBAR CROSSBAR
  • 42. 2011: welcome to the APU time! APU GPU CPU “Supercomputing power in a notebook platform whose battery lasts for a full day”
  • 43.
  • 50.
  • 51.
  • 52. C6 and power gating
  • 53. Array of SIMD Engines
  • 55. Industry leading 3D and graphics processing
  • 56. 3rd Generation Unified Video Decoder
  • 58. DDR3 800-1066, 2 DIMMs, 64 bit channel
  • 59.
  • 60. Configurable externally as HDMI, DVI, and/or Display Port
  • 61. Also supports a single link LVDS for internal panels
  • 64.
  • 65.
  • 66. Certified OpenCL 1.0 compliant by the Khronos Group
  • 67. Write code that can scale well on multi-core CPUs and GPUs
  • 68. AMD delivers on the promise of OpenCL™, with both high-performance CPU and GPU technologies
  • 69.
  • 70. The power of Fusion: Leverages CPUs and GPUs for balanced system approach
  • 71. Broad industry support: Created by architects from AMD, Apple, IBM, Intel, Nvidia, Sony, etc.
  • 72. Fast track development: Ratified in December; AMD is the first company to provide a complete OpenCL solution
  • 73. Momentum: Enormous interest from mainstream developers and application ISVsMore stream-enabled applications across all markets
  • 74.
  • 75.
  • 77.
  • 78. Comparing OpenCL™ and DirectX® 11 DirectCompute How will developers choose between OpenCL™ and DirectX® 11 DirectCompute? Feature set is similar in both APIs DirectX® 11 DirectCompute Easiest path to add compute capabilities to existing DirectX applications Windows Vista® and Windows® 7 only OpenCL™ Ideal path for new applications porting to the GPU for the first time True multiplatform: Windows®, Linux®, MacOS Natural programming without dealing with a graphics API
  • 79.
  • 80. Subset of ISO C99 with language extensions - familiar to developers
  • 81. Well-defined numerical accuracy - IEEE 754 rounding behavior with defined maximum error
  • 82. Online or offline compilation and build of compute kernel executables
  • 83.
  • 84. Query, select and initialize compute devices
  • 85. Create compute contexts and work-queues
  • 87.
  • 88. Summary 46 X86 PROCESSOR EVOLUTION THE GPU AS AN ACCELERATOR ACCELERATED PROCESSING UNITS INTRODUCTION TO OpenCL http://developer.amd.com

Editor's Notes

  1. Our new technology pillars that will help the channel differentiate
  2. Explain how 3 monitors can be less expensive than single 30” monitor. E.g 3x22” ~ $500 solution, vs single 30” > $1000On the productivity, also explain ISVs continue to leverage multi-monitor. E.g. MS office 2010, on powerpoint you can open multiple files on multiple windows.
  3. Original legal approval – Maranello Platform Launch, March 2010The first generation DCA introduced features now expected in the market[cover features at bottom quickly and go to next slide]
  4. Original legal approval – Maranello Platform Launch, March 2010Today’s introduction brings DCA 2.0Four memory channels12 DIMMs per CPUSupports up to 12 cores today, will support next-gen core with up to 16 per CPULet’s take a closer look at the effect of memory on workloads [next slide]
  5. done
  6. Add more deep blue computers
  7. Add “All models ATI Radeon™”Add “as of this date the HD5870 GPU has the highest GFLOPS/mm2 of all known products”
  8. Explain how 3 monitors can be less expensive than single 30” monitor. E.g 3x22” ~ $500 solution, vs single 30” > $1000On the productivity, also explain ISVs continue to leverage multi-monitor. E.g. MS office 2010, on powerpoint you can open multiple files on multiple windows.
  9. Work on the slide (larget text)
  10. Using ATI Stream technology, enjoy better visual quality when you watch streaming video online (YouTube/Hulu) with new video enhancement features.*
  11. Explain how 3 monitors can be less expensive than single 30” monitor. E.g 3x22” ~ $500 solution, vs single 30” > $1000On the productivity, also explain ISVs continue to leverage multi-monitor. E.g. MS office 2010, on powerpoint you can open multiple files on multiple windows.
  12. Let’s look at today’s compute platforms:You have a Phenom II with 758 million transistors on 45nm process technology on the left On the right you see a 5870 DX11 GPU with 2.15 billion transistors on 40nm process technology. Today, with the emergence of visual computing, you see more work than ever before for the GPU. Especially with, arguably for consumers, the most important workload: video.The explosion of HD video and now HD gaming, means the GPU matters more than ever in the PC platform. More user-generated content puts more of the work onto the GPU such as video processing and rendering and 3D user interface.The era of visual computing is already becoming more about mobility and being able to do more of what I’ve just described on the go. However, users do not want more compute capabilities at the expense of battery life or smaller form factors.Favoring one component over the other or taking a niche approach to balanced visual computing platforms does not meet the needs of the mass market. Usage scenarios favor a combination of GPU/CPU balance and low power..
  13. Now – Many of you are technologists, so you are probably glad to see me finally start talking about some technology – the workload changes are also dramatically impacting chip architectures.This chart does a good job of demonstrating the evolution of chip architectures:Starting on X axis on the left you go back in time to highly programmable, single core CPUs which aimed to increase throughput (Y axis) over time by first adding threads, then cores.GPUs on the other hand, started out way to the right in terms of throughput and have been becoming more and more programmable.We call this evolution the move from Homogenous Computing to Heterogeneous Computing , finally resulting up on the top right where the two arrows meet in what we call an APU. A combination of different types of cores, working closely together on different type workloads for optimum performance per watt per mm2This AMD’s architectural vision of the future and where we are heading with our first APU in 2011, the Llano processor – our first integrated CPU + GPU on a single piece of silicon.
  14. WHERE WE ARE TODAYAttempt to provide an environment in which optimized hardware can provide higher absolute performance, better power efficiency, and lower cost. At the same time, the goal is to dramatically improve programmer productivity as the cost of software development is substantially the same as hardware developmentThis means support for heterogeneous multi-core hardware and a much more effective application programming environment are critical.This chart does a good job of summarizing the evolution of chip architectures:Starting on X axis on the left you go back in time to highly programmable, single core CPUs which aimed to increase throughput (Y axis) over time by first adding threads, then cores.GPUs on the other hand, started out way to the right in terms of throughput and have been becoming more and more programmable.We call this evolution the move from Homogenous Computing to Heterogeneous Computing , finally resulting up on the top right where the two arrows meet in what we call an APU. A combination of different types of cores, working closely together on different type workloads for optimum performance per watt per mm2
  15. The need for this optimal energy-efficient balance of CPU and GPU represents the beginning of a new era of computing in 2011.The Fusion of CPU and GPU compute power is what the next chapter in visual computing requires – a powerful visual computing experience at home or on the go without compromise. Our AMD Fusion™ design is driven by mobility and is based on a low-power visual compute architecture that will enhance active and resting battery life while increasing both CPU and GPU performance. This is the culmination of the vision of ‘One AMD’ and only AMD can deliver the GPU and CPU combination that will be the future of computing
  16. Review slide to determine message
  17. The Industry has always tried to move away from proprietary technology and towards open standards when available.The proprietary Apple Display Connector never became popular since DVI was license-free and widely available.3dfx’s Glide API for 3D graphics failed to stick around in the market long after DirectX was available on a wide variety of hardware.nVIDIA’s Cg language was never widely used since OpenGL and DirectX provided a compelling open alternativeThe Unified Display Interface was a failed interface backed by Intel and nVIDIA, which was deprecated in favor of the license-free DisplayPort standard.RAMBUS has tried to bring many proprietary memory technologies to market, but have always been displaced by JEDEC open memory standards.CUDA is a proprietary GPGPU model into the market whose specification is controlled by only one company, we believe it will soon be replaced by OpenCL and the DirectX Compute Shader.