CPU<br />GPU<br />OpenCL<br />DirectCompute<br />Accelerated Computing<br />Roberto Brandão<br />AMD Latin America<br />
Agenda<br />X86 PROCESSOR EVOLUTION<br />THE GPU AS AN ACCELERATOR<br />ACCELERATED PROCESSING UNITS<br />INTRODUCTION TO ...
Evolving x86 Processors<br />
AMD architecture“Istambul” six-core diagram<br />Chipset<br />Balanced<br />caches<br />2<br />3<br />4<br />5<br />6<br /...
4P/24-core system examplevery good scalability<br />One memory controller for every processor<br />Full-duplex Hyper Trans...
Direct Connect Architecture 1.0Balanced and Scalable Design to Support up to 6 Cores<br />2 MEMORY<br /> CHANNELS<br />2 M...
Direct Connect Architecture 2.0Balanced and Scalable Design to Support up to 16 Cores* per CPU <br />4 MEMORY<br /> CHANNE...
Four memory channels
Up to 50% more DIMMs
Up to 33% increase in CPU to CPU communication speed±</li></li></ul><li>What is next for x86 CPUs<br /><ul><li>More proces...
Improved IPC </li></ul>(8 per cycle is a target)<br />
Top500 list - beyond the petaflop<br />Datacenters in the USA will spend more than $3 billion on energy in 2009 <br />
1997:<br />X<br /> Garry Kasparov          IBM Deep Blue<br />
The World’s Most Powerful GPU<br />=<br />177x <br />IBM Deep Blue<br />
2011 GPU Architecture AMD Radeon™ HD 6900 Series<br />Dual graphics engines<br />New VLIW4 core architecture<br />Up to 24...
Designing very efficient GPUsFull load: 180W; Idle:27W<br />14.47<br />GFLOPS/W<br />GFLOPS/W<br />GFLOPS/mm2<br />7.50<br...
Old and New in High Performance Computing<br />Old: Power is free, Transistors are expensive<br />New: Power expensive, Tr...
GPUs: more than just gaming<br />15<br />2700<br />Both use GPUs<br />Oil exploration platform - 2010<br />Wii Sports - Go...
DirectX® 11 Multi-Threading<br /><ul><li>Application, DirectX runtime, and DirectX driver can each run in separate threads
Tasks like loading a texture or compiling a shader can execute in parallel with main rendering thread</li></ul>DirectX® 10...
Today’s GPUs focused on<br />GAMING<br />ENTERTAINMENT<br />PRODUCTIVITY<br />
DirectX® 11 Tessellation<br />DirectX® 10<br />DirectX® 11<br />No Tessellation<br />Tessellation<br />Images courtesy of ...
5/25/2011<br />
5/25/2011<br />
Research companies already using<br />21<br />Oil exploration<br />Nature simulation<br />Wheather forecast<br />Fluid Dyn...
AMD Balanced Platform<br />GPU is ideal for data parallel algorithms like image processing, CAE, etc<br /><ul><li>Great us...
Great use for additional GPUs</li></ul>CPU is excellent for running some algorithms<br /><ul><li>Ideal place to process if...
Great use for additional CPU cores</li></ul>Graphics Workloads<br />Other Highly Parallel Workloads<br />Serial/Task-Paral...
ATI Stream Technology is…<br />Heterogeneous: Developers leverage AMD GPUs and x86 CPUs for optimal application performanc...
Improvements already reached consumers<br />ATI <br />Stream<br />Processor utilization<br />Adobe Flash plugin used by Yo...
 Lower processor usage</li></li></ul><li>GPU-accelerated video transcoding<br />Ipod Video<br />HD Video<br />Up to 6x fas...
Video Transcoding SampleNo GPU Acceleration<br />CPU Usage: 100%<br />Frames<br />Frames<br />Using four<br />CPU Cores<br...
Video Transcoding SampleATI GPU Acceleration<br />CPU Usage: 45%<br />Control<br />Control<br />Frames<br />Frames<br />GP...
FUSION TECHNOLOGY<br />
Today<br />TeraFLOPS-class GPU<br />Multi-core CPU<br />~800 million transistors<br />Multi-tasking<br />Up to 2 billion t...
A new Era on performance evolution<br />Multi-Core<br />Heterogeneous <br />computing<br />Single-Core<br />Challenge:<br ...
Power efficient</li></ul>Cons:<br />Software availability<br />?<br />Single-thread<br />We are here<br />Performance<br /...
A new Era on performance evolution<br />Multi-Core<br />Single-Core<br />CPU<br />Core efficiency <br />Software <br />Acc...
Putting all together – The Future is Fusion<br />RingStop<br />Client Interface<br />Client Interface<br />Client Interfac...
Putting all together – The Future is Fusion<br />Chipset<br />RV700 GPU Core (2008-2009)<br />AMD “Istambul” six-core proc...
Putting all together – The Future is Fusion<br />RV700 GPU Core<br />AMD “Istambul” six-core processor<br />CROSSBAR<br />...
2011: welcome to the APU time!<br />APU<br />GPU<br />CPU<br />“Supercomputing power in a notebook platform whose battery ...
One Design, Fewer Watts, Massive Capability <br />“Zacate” AMD Fusion APU <br />Discrete-level DirectX® 11 GPU <br />Dual-...
18 watts
59 sq. mm
8 watts
66 sq. mm
13 watts
Upcoming SlideShare
Loading in …5
×

Amd accelerated computing -ufrj

1,319 views

Published on

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,319
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
46
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • Our new technology pillars that will help the channel differentiate
  • Explain how 3 monitors can be less expensive than single 30” monitor. E.g 3x22” ~ $500 solution, vs single 30” &gt; $1000On the productivity, also explain ISVs continue to leverage multi-monitor. E.g. MS office 2010, on powerpoint you can open multiple files on multiple windows.
  • Original legal approval – Maranello Platform Launch, March 2010The first generation DCA introduced features now expected in the market[cover features at bottom quickly and go to next slide]
  • Original legal approval – Maranello Platform Launch, March 2010Today’s introduction brings DCA 2.0Four memory channels12 DIMMs per CPUSupports up to 12 cores today, will support next-gen core with up to 16 per CPULet’s take a closer look at the effect of memory on workloads [next slide]
  • done
  • Add more deep blue computers
  • Add “All models ATI Radeon™”Add “as of this date the HD5870 GPU has the highest GFLOPS/mm2 of all known products”
  • Explain how 3 monitors can be less expensive than single 30” monitor. E.g 3x22” ~ $500 solution, vs single 30” &gt; $1000On the productivity, also explain ISVs continue to leverage multi-monitor. E.g. MS office 2010, on powerpoint you can open multiple files on multiple windows.
  • Work on the slide (larget text)
  • Using ATI Stream technology, enjoy better visual quality when you watch streaming video online (YouTube/Hulu) with new video enhancement features.*
  • Explain how 3 monitors can be less expensive than single 30” monitor. E.g 3x22” ~ $500 solution, vs single 30” &gt; $1000On the productivity, also explain ISVs continue to leverage multi-monitor. E.g. MS office 2010, on powerpoint you can open multiple files on multiple windows.
  • Let’s look at today’s compute platforms:You have a Phenom II with 758 million transistors on 45nm process technology on the left On the right you see a 5870 DX11 GPU with 2.15 billion transistors on 40nm process technology. Today, with the emergence of visual computing, you see more work than ever before for the GPU. Especially with, arguably for consumers, the most important workload: video.The explosion of HD video and now HD gaming, means the GPU matters more than ever in the PC platform. More user-generated content puts more of the work onto the GPU such as video processing and rendering and 3D user interface.The era of visual computing is already becoming more about mobility and being able to do more of what I’ve just described on the go. However, users do not want more compute capabilities at the expense of battery life or smaller form factors.Favoring one component over the other or taking a niche approach to balanced visual computing platforms does not meet the needs of the mass market. Usage scenarios favor a combination of GPU/CPU balance and low power..
  • Now – Many of you are technologists, so you are probably glad to see me finally start talking about some technology – the workload changes are also dramatically impacting chip architectures.This chart does a good job of demonstrating the evolution of chip architectures:Starting on X axis on the left you go back in time to highly programmable, single core CPUs which aimed to increase throughput (Y axis) over time by first adding threads, then cores.GPUs on the other hand, started out way to the right in terms of throughput and have been becoming more and more programmable.We call this evolution the move from Homogenous Computing to Heterogeneous Computing , finally resulting up on the top right where the two arrows meet in what we call an APU. A combination of different types of cores, working closely together on different type workloads for optimum performance per watt per mm2This AMD’s architectural vision of the future and where we are heading with our first APU in 2011, the Llano processor – our first integrated CPU + GPU on a single piece of silicon.
  • WHERE WE ARE TODAYAttempt to provide an environment in which optimized hardware can provide higher absolute performance, better power efficiency, and lower cost. At the same time, the goal is to dramatically improve programmer productivity as the cost of software development is substantially the same as hardware developmentThis means support for heterogeneous multi-core hardware and a much more effective application programming environment are critical.This chart does a good job of summarizing the evolution of chip architectures:Starting on X axis on the left you go back in time to highly programmable, single core CPUs which aimed to increase throughput (Y axis) over time by first adding threads, then cores.GPUs on the other hand, started out way to the right in terms of throughput and have been becoming more and more programmable.We call this evolution the move from Homogenous Computing to Heterogeneous Computing , finally resulting up on the top right where the two arrows meet in what we call an APU. A combination of different types of cores, working closely together on different type workloads for optimum performance per watt per mm2
  • The need for this optimal energy-efficient balance of CPU and GPU represents the beginning of a new era of computing in 2011.The Fusion of CPU and GPU compute power is what the next chapter in visual computing requires – a powerful visual computing experience at home or on the go without compromise. Our AMD Fusion™ design is driven by mobility and is based on a low-power visual compute architecture that will enhance active and resting battery life while increasing both CPU and GPU performance. This is the culmination of the vision of ‘One AMD’ and only AMD can deliver the GPU and CPU combination that will be the future of computing
  • Review slide to determine message
  • The Industry has always tried to move away from proprietary technology and towards open standards when available.The proprietary Apple Display Connector never became popular since DVI was license-free and widely available.3dfx’s Glide API for 3D graphics failed to stick around in the market long after DirectX was available on a wide variety of hardware.nVIDIA’s Cg language was never widely used since OpenGL and DirectX provided a compelling open alternativeThe Unified Display Interface was a failed interface backed by Intel and nVIDIA, which was deprecated in favor of the license-free DisplayPort standard.RAMBUS has tried to bring many proprietary memory technologies to market, but have always been displaced by JEDEC open memory standards.CUDA is a proprietary GPGPU model into the market whose specification is controlled by only one company, we believe it will soon be replaced by OpenCL and the DirectX Compute Shader.
  • Amd accelerated computing -ufrj

    1. 1. CPU<br />GPU<br />OpenCL<br />DirectCompute<br />Accelerated Computing<br />Roberto Brandão<br />AMD Latin America<br />
    2. 2. Agenda<br />X86 PROCESSOR EVOLUTION<br />THE GPU AS AN ACCELERATOR<br />ACCELERATED PROCESSING UNITS<br />INTRODUCTION TO OpenCL<br />
    3. 3. Evolving x86 Processors<br />
    4. 4. AMD architecture“Istambul” six-core diagram<br />Chipset<br />Balanced<br />caches<br />2<br />3<br />4<br />5<br />6<br />1<br />Native <br />six-core <br />processor<br />L2<br />L2<br />L2<br />L2<br />L2<br />L2<br />L3 Cache <br />Lower memory <br />latency<br />CROSSBAR<br />Memory <br />Controller<br />Hyper <br />Transport<br />HyperTransport<br />Fast full-duplex<br />bus<br />PCI-e<br />
    5. 5. 4P/24-core system examplevery good scalability<br />One memory controller for every processor<br />Full-duplex Hyper Transport links (up to 5.2GHz)<br />Bus Optimization: HT Assist (Cache Probe Filtering)<br />Still the only available 4P system with Direct Connect Architecture<br />MEMORY<br />MEMORY<br />MEMORY<br />MEMORY<br />
    6. 6. Direct Connect Architecture 1.0Balanced and Scalable Design to Support up to 6 Cores<br />2 MEMORY<br /> CHANNELS<br />2 MEMORY<br /> CHANNELS<br />8 DIMMs per CPU<br />8 DIMMs per CPU<br />2 MEMORY<br /> CHANNELS<br />2 MEMORY<br /> CHANNELS<br />8 DIMMs per CPU<br />8 DIMMs per CPU<br />No front side bus<br />HyperTransport™ technology<br />Integrated memory controller<br />NUMA memory architecture<br />
    7. 7. Direct Connect Architecture 2.0Balanced and Scalable Design to Support up to 16 Cores* per CPU <br />4 MEMORY<br /> CHANNELS<br />4 MEMORY<br /> CHANNELS<br />12 DIMMs per CPU<br />12 DIMMs per CPU<br />4 MEMORY<br /> CHANNELS<br />4 MEMORY<br /> CHANNELS<br />12 DIMMs per CPU<br />12 DIMMs per CPU<br /><ul><li>1-hop between processors
    8. 8. Four memory channels
    9. 9. Up to 50% more DIMMs
    10. 10. Up to 33% increase in CPU to CPU communication speed±</li></li></ul><li>What is next for x86 CPUs<br /><ul><li>More processor cores to come </li></ul>(12, 16, 16 double cores)<br /><ul><li>More memory channels (improves memory bandwidth per core)
    11. 11. Improved IPC </li></ul>(8 per cycle is a target)<br />
    12. 12. Top500 list - beyond the petaflop<br />Datacenters in the USA will spend more than $3 billion on energy in 2009 <br />
    13. 13. 1997:<br />X<br /> Garry Kasparov IBM Deep Blue<br />
    14. 14. The World’s Most Powerful GPU<br />=<br />177x <br />IBM Deep Blue<br />
    15. 15. 2011 GPU Architecture AMD Radeon™ HD 6900 Series<br />Dual graphics engines<br />New VLIW4 core architecture<br />Up to 24 SIMD engines<br />Up to 96 Texture Units<br />Upgraded render back-ends<br />Improved anti-aliasing performance<br />Fast 256-bit GDDR5 memory interface<br />Up to 5.5 Gbps<br />New GPU compute features<br />
    16. 16. Designing very efficient GPUsFull load: 180W; Idle:27W<br />14.47<br />GFLOPS/W<br />GFLOPS/W<br />GFLOPS/mm2<br />7.50<br />7.90<br />GFLOPS/mm2<br />4.50<br />2.21<br />2.01<br />4.56<br />2.24<br />1.07<br />1.06<br />0.92<br />0.42<br />
    17. 17. Old and New in High Performance Computing<br />Old: Power is free, Transistors are expensive<br />New: Power expensive, Transistors free<br />(Can put more transistors on chip than can afford to turn on)<br />Old: Multiplies are slow, Memory access is fast<br />New: Multiplies fast, Memory slow<br />(up 200 clocks to DRAM memory, 4 clocks for FP multiply)<br />Old: Increasing Instruction Level Parallelism via compilers innovation<br />New: Explicit thread and data parallelism must be exploited<br />
    18. 18. GPUs: more than just gaming<br />15<br />2700<br />Both use GPUs<br />Oil exploration platform - 2010<br />Wii Sports - Golf<br />
    19. 19. DirectX® 11 Multi-Threading<br /><ul><li>Application, DirectX runtime, and DirectX driver can each run in separate threads
    20. 20. Tasks like loading a texture or compiling a shader can execute in parallel with main rendering thread</li></ul>DirectX® 10<br />DirectX® 11<br />16<br />
    21. 21. Today’s GPUs focused on<br />GAMING<br />ENTERTAINMENT<br />PRODUCTIVITY<br />
    22. 22. DirectX® 11 Tessellation<br />DirectX® 10<br />DirectX® 11<br />No Tessellation<br />Tessellation<br />Images courtesy of Unigine Corp.<br />18<br />
    23. 23. 5/25/2011<br />
    24. 24. 5/25/2011<br />
    25. 25. Research companies already using<br />21<br />Oil exploration<br />Nature simulation<br />Wheather forecast<br />Fluid Dynamics<br />
    26. 26. AMD Balanced Platform<br />GPU is ideal for data parallel algorithms like image processing, CAE, etc<br /><ul><li>Great use for ATI Stream technology
    27. 27. Great use for additional GPUs</li></ul>CPU is excellent for running some algorithms<br /><ul><li>Ideal place to process if GPU is fully loaded
    28. 28. Great use for additional CPU cores</li></ul>Graphics Workloads<br />Other Highly Parallel Workloads<br />Serial/Task-Parallel Workloads<br />Delivers optimal performance for a wide range of platform configurations<br />
    29. 29. ATI Stream Technology is…<br />Heterogeneous: Developers leverage AMD GPUs and x86 CPUs for optimal application performance and user experience<br />High performance:Massively parallel, programmable GPU architecture delivers unprecedented performance and power efficiency<br />Industry Standards:OpenCL™ and DirectCompute 11 enable cross-platform development <br />Engineering<br />Sciences<br />Government<br />Gaming<br />Digital Content Creation<br />Productivity<br />
    30. 30. Improvements already reached consumers<br />ATI <br />Stream<br />Processor utilization<br />Adobe Flash plugin used by Youtube.com<br /><ul><li> Better image quality and video smoothness
    31. 31. Lower processor usage</li></li></ul><li>GPU-accelerated video transcoding<br />Ipod Video<br />HD Video<br />Up to 6x faster when using an AMD graphics card<br />
    32. 32. Video Transcoding SampleNo GPU Acceleration<br />CPU Usage: 100%<br />Frames<br />Frames<br />Using four<br />CPU Cores<br />GPU Usage: 1%<br />26<br />
    33. 33. Video Transcoding SampleATI GPU Acceleration<br />CPU Usage: 45%<br />Control<br />Control<br />Frames<br />Frames<br />GPU Usage: 35%<br />Using hundreds of<br />Stream Processors<br />27<br />
    34. 34. FUSION TECHNOLOGY<br />
    35. 35. Today<br />TeraFLOPS-class GPU<br />Multi-core CPU<br />~800 million transistors<br />Multi-tasking<br />Up to 2 billion transistors<br />Jogosemmultiplosmonitores<br />Video e audio Full HD<br />
    36. 36. A new Era on performance evolution<br />Multi-Core<br />Heterogeneous <br />computing<br />Single-Core<br />Challenge:<br />Power consumption<br />Software<br />Challenge:<br />Power consumption<br />Complexity<br />Pros:<br /><ul><li>Performance
    37. 37. Power efficient</li></ul>Cons:<br />Software availability<br />?<br />Single-thread<br />We are here<br />Performance<br />Performance<br />We are here<br />We are here<br />Time x Cores<br />Time<br />Time<br />
    38. 38. A new Era on performance evolution<br />Multi-Core<br />Single-Core<br />CPU<br />Core efficiency <br />Software <br />Acceleration<br />Low power consumption<br />Multimedia<br />Gaming<br />GPU<br />
    39. 39. Putting all together – The Future is Fusion<br />RingStop<br />Client Interface<br />Client Interface<br />Client Interface<br />Client Interface<br />Write Crossbar Switch<br />Memory<br />Controller<br />RingStop<br />RingStop<br />Chipset<br />Client Interface<br />Client Interface<br />Client Interface<br />Client Interface<br />RingStop<br />RV500 GPU Core (2006)<br />AMD “Istambul” six-core processor<br />2<br />3<br />4<br />5<br />6<br />1<br />L2<br />L2<br />L2<br />L2<br />L2<br />L2<br />Cache L3<br />CROSSBAR<br />Memory <br />Controller<br />Hyper <br />Transport<br />HyperTransport<br />PCI-e<br />
    40. 40. Putting all together – The Future is Fusion<br />Chipset<br />RV700 GPU Core (2008-2009)<br />AMD “Istambul” six-core processor<br />2<br />3<br />4<br />5<br />6<br />1<br />L2<br />L2<br />L2<br />L2<br />L2<br />L2<br />Cache L3<br />CROSSBAR<br />Memory <br />Controller<br />Hyper <br />Transport<br />HyperTransport<br />PCI-e<br />
    41. 41. Putting all together – The Future is Fusion<br />RV700 GPU Core<br />AMD “Istambul” six-core processor<br />CROSSBAR<br />CROSSBAR<br />
    42. 42. 2011: welcome to the APU time!<br />APU<br />GPU<br />CPU<br />“Supercomputing power in a notebook platform whose battery lasts for a full day”<br />
    43. 43. One Design, Fewer Watts, Massive Capability <br />“Zacate” AMD Fusion APU <br />Discrete-level DirectX® 11 GPU <br />Dual-Core CPU<br />+<br />+<br />=<br />Northbridge<br /><ul><li>75 sq. mm
    44. 44. 18 watts
    45. 45. 59 sq. mm
    46. 46. 8 watts
    47. 47. 66 sq. mm
    48. 48. 13 watts
    49. 49. 117 sq. mm
    50. 50. 25 watts </li></li></ul><li>Graphics and Media Processing Efficiency Improvements<br />2011 APU-based Platform <br />2010 IGP-based Platform<br />~17 GB/sec<br />~17 GB/sec<br />CPU Cores<br />DDR3 DIMM<br />Memory<br />CPU Cores<br />DDR3 DIMM<br />Memory<br />CPU Chip<br />APU Chip<br />UVD<br />UNB / MC<br />MC<br />UNB<br />GPU<br />~27 GB/sec<br />~7 GB/sec<br />Graphics requires memory bandwidth to bring full capabilities to life<br />GPU<br />UVD<br />~27 GB/sec<br />PCIe<br />SB Functions<br />3X bandwidth between GPU and memory<br />Even the same sized GPU is substantially more effective in this configuration<br />Eliminate latency and power associated with the extra chip crossing<br />Substantially smaller physical foot print<br />PCIe<br />Bandwidth pinch points and latency hold back the GPU capabilities<br />
    51. 51. “Ontario” & “Zacate” Architecture<br />APU<br /><ul><li>2 x86 CPU Cores (40nm “Bobcat” core – 1 MB L2, 64-bit FPU)
    52. 52. C6 and power gating
    53. 53. Array of SIMD Engines
    54. 54. DX11 graphics performance
    55. 55. Industry leading 3D and graphics processing
    56. 56. 3rd Generation Unified Video Decoder
    57. 57. H.264, VC1, DixX/Xvid format
    58. 58. DDR3 800-1066, 2 DIMMs, 64 bit channel
    59. 59. BGA package</li></ul>Display and I/O<br /><ul><li>Two dedicated digital display interfaces
    60. 60. Configurable externally as HDMI, DVI, and/or Display Port
    61. 61. Also supports a single link LVDS for internal panels
    62. 62. Integrated VGA
    63. 63. 5x8 PCIe®
    64. 64. “Hudson” Fusion Controller Hub</li></li></ul><li>xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx<br />OpenCL<br />Working together<br />
    65. 65. ATI Stream SDK: OpenCL™ For Multicore x86 CPUs and GPUs<br />http://developer.amd.com/<br />The Power of Fusion: Developers leverage heterogeneous architecture to deliver superior user experience<br /><ul><li>First complete OpenCL™ development platform
    66. 66. Certified OpenCL 1.0 compliant by the Khronos Group
    67. 67. Write code that can scale well on multi-core CPUs and GPUs
    68. 68. AMD delivers on the promise of OpenCL™, with both high-performance CPU and GPU technologies
    69. 69. Available for download now as part of ATI Stream SDK beta program – includes documentation, samples, and developer support</li></li></ul><li>OpenCL™: Game-Changing DevelopmentEnabling Broad Adoption of GP-GPU Capabilities<br /><ul><li>Industry standard API: Open, multiplatform development platform for heterogeneous architectures
    70. 70. The power of Fusion: Leverages CPUs and GPUs for balanced system approach
    71. 71. Broad industry support: Created by architects from AMD, Apple, IBM, Intel, Nvidia, Sony, etc.
    72. 72. Fast track development: Ratified in December; AMD is the first company to provide a complete OpenCL solution
    73. 73. Momentum: Enormous interest from mainstream developers and application ISVs</li></ul>More stream-enabled applications across all markets<br />
    74. 74. Open Standards:<br />Maximize Developer Freedom and Addressable Market<br />Vendor specific<br />Cross-platform limiters<br />Vendor neutral<br />Cross-platform enablers<br /><ul><li>Apple Display Connector
    75. 75. 3dfx Glide</li></ul>Digital Visual Interface<br />OpenCL™<br />DirectX®<br /><ul><li>Nvidia CUDA
    76. 76. Nvidia Cg
    77. 77. Rambus</li></ul>Certified DP<br />JEDEC<br />OpenGL®<br /><ul><li>Unified Display Interface</li></ul>OpenCL™ and DirectX® are emerging as the two most important standards for heterogeneous (CPU+GPU) compute<br />
    78. 78. Comparing OpenCL™ and DirectX® 11 DirectCompute<br />How will developers choose between OpenCL™ and DirectX® 11 DirectCompute?<br />Feature set is similar in both APIs<br />DirectX® 11 DirectCompute<br />Easiest path to add compute capabilities to existing DirectX applications<br />Windows Vista® and Windows® 7 only<br />OpenCL™<br />Ideal path for new applications porting to the GPU for the first time<br />True multiplatform: Windows®, Linux®, MacOS<br />Natural programming without dealing with a graphics API<br />
    79. 79. Anatomy of OpenCL™<br />Language Specification<br /><ul><li>C-based cross-platform programming interface
    80. 80. Subset of ISO C99 with language extensions - familiar to developers
    81. 81. Well-defined numerical accuracy - IEEE 754 rounding behavior with defined maximum error
    82. 82. Online or offline compilation and build of compute kernel executables
    83. 83. Includes a rich set of built-in functions</li></ul>Platform Layer API<br />Runtime API<br /><ul><li>A hardware abstraction layer over diverse computational resources
    84. 84. Query, select and initialize compute devices
    85. 85. Create compute contexts and work-queues
    86. 86. Execute compute kernels
    87. 87. Manage scheduling, compute, and memory resources</li></li></ul><li>OpenCL Example<br />
    88. 88. Summary<br />46<br />X86 PROCESSOR EVOLUTION<br />THE GPU AS AN ACCELERATOR<br />ACCELERATED PROCESSING UNITS<br />INTRODUCTION TO OpenCL<br />http://developer.amd.com<br />
    89. 89. Obrigado!<br />roberto.brandao@amd.com<br />
    90. 90. roberto.brandao@amd.com<br />Obrigado!<br />

    ×