AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIBILITY
 

AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIBILITY

on

  • 31,114 views

Phil Roger goes deeper into what HSA is, and some of the area it can address since his first presentation on HSA in 2011. He also announces the HSA Foundation and it founding members

Phil Roger goes deeper into what HSA is, and some of the area it can address since his first presentation on HSA in 2011. He also announces the HSA Foundation and it founding members

Statistics

Views

Total Views
31,114
Views on SlideShare
5,709
Embed Views
25,405

Actions

Likes
0
Downloads
145
Comments
0

31 Embeds 25,405

http://habrahabr.ru 13140
http://hsafoundation.com 7455
http://www.opennet.ru 2748
http://www.hsafoundation.com 1726
http://m.habrahabr.ru 200
http://opennet.ru 41
http://107.170.238.52 15
http://translate.googleusercontent.com 11
http://indieweb.ru 10
http://ua.opennet.ru 10
http://web.opennet.ru 8
http://mobile.opennet.ru 8
http://77.234.201.242 5
http://pda.opennet.ru 3
http://pirates.in.ua 3
http://sam-linux.ucoz.net 3
http://lin.in.ua 3
http://security-corp.org 2
http://forum.opennet.ru 2
http://webcache.googleusercontent.com 1
http://cisco.opennet.ru 1
http://buxley.habratest.net 1
http://habrahabr.ru.sixxs.org 1
http://bsd.opennet.ru 1
http://feeds.feedburner.com 1
http://3lp.cx 1
http://www.admuncher.com 1
http://www.google.com 1
http://daemonnews.opennet.ru 1
http://racerlink.com 1
http://sysmagazine.com 1
More...

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIBILITY AFDS 2012 Phil Rogers Keynote: THE PROGRAMMER’S GUIDE TO A UNIVERSE OF POSSIBILITY Presentation Transcript

  • THE PROGRAMMER’S GUIDE TO AUNIVERSE OF POSSIBILITYHeterogeneous System ArchitecturePhil RogersAMDCorporate Fellow
  • Most parallel code runs on CPUs designed for scalar workloads2 | The Programmer’s Guide to a Universe of Possibility | June 12, 2012
  • WHAT DID WE HEAR FROM TOM MALLOY THIS MORNING?3 | The Programmer’s Guide to a Universe of Possibility | June 12, 2012
  • CHANGING THE THINKING Typically platform builders create innovative new hardware and offer an API for software to access it That tired thinking has only ever had niche success!4 | The Programmer’s Guide to a Universe of Possibility | June 12, 2012
  • HETEROGENEOUS SYSTEM ARCHITECTURE ROADMAP5 | The Programmer’s Guide to a Universe of Possibility | June 12, 2012
  • HETEROGENEOUS SYSTEM ARCHITECTUREBrings All the Processors in a System into Unified Coherent Memory POWER EFFICIENT INDUSTRY EASY TO SUPPORT PROGRAM OPEN FUTURE STANDARD LOOKING ESTABLISHED TECHNOLOGY FOUNDATION6 | The Programmer’s Guide to a Universe of Possibility | June 12, 2012
  • THE HSA OPPORTUNITY ON MODERN APPLICATIONS SOLUTION PROBLEM  HSA + Libraries =  GPU/HW blocks hard to program productivity & performance with low power  Not all workloads accelerateDeveloper Few Return Few M 100Ks Wide range of differentiated ~100K ~200 Significant HSA HSA GPU niche(Differentiation in coders experiences apps apps coders value performance, reduced power, features, time to market) PROBLEM  Historically, developers program CPUs ~10+M* ~4M Good user CPU apps experiences coders Developer Investment (Effort, time, new skills) *IDC 7 | The Programmer’s Guide to a Universe of Possibility | June 12, 2012
  • APPLICATION AREAS WITH ABUNDANT PARALLEL WORKLOADS Biometric Recognition Natural UI & Secure, fast, accurate: Augmented Gestures face, voice, fingerprints Reality Touch, gesture, Superimpose graphics, and voice audio, and other digital information as a virtual overlay Content AV Content Everywhere Management Content from any Searching, indexing and source to any display tagging of video & audio. seamlessly Beyond HD multimedia data mining Experiences Streaming media, new codecs, 3D, transcode, audio8 | The Programmer’s Guide to a Universe of Possibility | June 12, 2012
  • HAAR Face Detection CORNERSTONE TECHNOLOGY FOR COMPUTERVISION9 | The Programmer’s Guide to a Universe of Possibility | June 12, 2012
  • LOOKING FOR FACES IN ALL THE RIGHT PLACES11 | The Programmer’s Guide to a Universe of Possibility | June 12, 2012
  • LOOKING FOR FACES IN ALL THE RIGHT PLACES Quick HD Calculations Search square = 21 x 21 Pixels = 1920 x 1080 = 2,073,600 Search squares = 1900 x 1060 = ~2 Million12 | The Programmer’s Guide to a Universe of Possibility | June 12, 2012
  • LOOKING FOR DIFFERENT SIZE FACES – BY SCALING THE VIDEO FRAME13 | The Programmer’s Guide to a Universe of Possibility | June 12, 2012
  • LOOKING FOR DIFFERENT SIZE FACES – BY SCALING THE VIDEO FRAME More HD Calculations 70% scaling in H and V Total Pixels = 4.07 Million Search squares = 3.8 Million14 | The Programmer’s Guide to a Universe of Possibility | June 12, 2012
  • HAAR CASCADE STAGES Feature k Feature l Stage N Feature m Face still Yes possible? Feature p No Feature r Stage N+1 Feature q REJECT FRAME15 | The Programmer’s Guide to a Universe of Possibility | June 12, 2012
  • 22 CASCADE STAGES, EARLY OUT BETWEEN EACH FACE STAGE 1 STAGE 2 STAGE 21 STAGE 22 CONFIRMED NO FACE Final HD Calculations Calculation Rate Search squares = 3.8 million 30 frames/sec = 1.4TCalcs/second Average features per square = 124 60 frames/sec = 2.8TCalcs/second Calculations per feature = 100 Calculations per frame = 47 GCalcs …and this only gets front-facing faces16 | The Programmer’s Guide to a Universe of Possibility | June 12, 2012
  • CASCADE DEPTH ANALYSIS Cascade Depth 25 20 15 10 5 20-25 0 15-20 10-15 5-10 0-517 | The Programmer’s Guide to a Universe of Possibility | June 12, 2012
  • UNBALANCING DUE TO EXITS IN EARLIER CASCADE STAGES Live Dead When running on the GPU, we run each search rectangle on a separate work item Early out algorithms, like HAAR, exhibit divergence between work items  Some work items exit early  Their neighbors continue  SIMD packing suffers as a result18 | The Programmer’s Guide to a Universe of Possibility | June 12, 2012
  • PROCESSING TIME/STAGE “Trinity” A10-4600M (6CU@497Mhz, 4 cores@2700Mhz) 100 90 80 70 60 Time (ms) 50 40 30 20 10 GPU CPU 0 1 2 3 4 5 6 7 8 9-22 Cascade Stage AMD A10 4600M APU with Radeon™ HD Graphics; CPU: 4 cores @ 2.3 MHz (turbo 3.2 GHz); GPU: AMD Radeon HD 7660G, 6 compute units, 685MHz; 4GB RAM; Windows 7 (64-bit); OpenCL™ 1.1 (873.1)19 | The Programmer’s Guide to a Universe of Possibility | June 12, 2012
  • PERFORMANCE CPU-VS-GPU “Trinity” A10-4600M (6CU@497Mhz, 4 cores@2700Mhz) 12 10 8 Images/Sec 6 4 2 CPU HSA GPU 0 0 1 2 3 4 5 6 7 8 22 Number of Cascade Stages on GPU AMD A10 4600M APU with Radeon™ HD Graphics; CPU: 4 cores @ 2.3 MHz (turbo 3.2 GHz); GPU: AMD Radeon HD 7660G, 6 compute units, 685MHz; 4GB RAM; Windows 7 (64-bit); OpenCL™ 1.1 (873.1)20 | The Programmer’s Guide to a Universe of Possibility | June 12, 2012
  • HAAR SOLUTION – RUN DIFFERENT CASCADES ON GPU AND CPU By seamlessly sharing data between CPU and GPU, HSA allows the right processor to handle its appropriate workload +2.5x -2.5x INCREASED DECREASED ENERGY PERFORMANCE PER FRAME21 | The Programmer’s Guide to a Universe of Possibility | June 12, 2012
  • ACCELERATING MEMCACHED CLOUD SERVER WORKLOAD22 | The Programmer’s Guide to a Universe of Possibility | June 12, 2012
  • DATACENTER WORKLOAD Generally used for short-term storage and caching, handling requests that would otherwise require database or file system accesses Used by Facebook, YouTube, Twitter, Wikipedia, Flickr, and others Effectively a large distributed hash table  Responds to store and get requests received over the network  Conceptually:  store(key, object)  object = get(key)23 | The Programmer’s Guide to a Universe of Possibility | June 12, 2012
  • OFFLOADING MEMCACHED KEY LOOKUP TO THE GPU Key Look Up Performance Execution Breakdown 4 100% 80% 3 60% 2 40% 1 20% 0 0 Multithreaded CPU Radeon HD 5870 “Trinity” A10-5800K Zacate E-350 Data Transfer Execution T. H. Hetherington, T. G. Rogers, L. Hsu, M. O’Connor, and T. M. Aamodt, “Characterizing and Evaluating a Key-Value Store Application on Heterogeneous CPU-GPU Systems,” Proceedings of the 2012 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS 2012), April 2012. http://ieeexplore.ieee.org/xpl/articleDetails.jsp?tp=&arnumber=618920924 | The Programmer’s Guide to a Universe of Possibility | June 12, 2012
  • ACCELERATING JAVA GOING BEYOND NATIVE LANGUAGES25 | The Programmer’s Guide to a Universe of Possibility | June 12, 2012
  • JAVA ENABLEMENT BY APARAPI Aparapi = Runtime capable of converting Java™ bytecode to OpenCL™ Developer creates For execution on any Java™ source OpenCL™ 1.1+ Source compiled to class files (bytecode) capable device using standard compiler (javac) OR execute via a thread pool if OpenCL™ is not available Classes packaged and deployed using established Java™ tool chain26 | The Programmer’s Guide to a Universe of Possibility | June 12, 2012
  • JAVA AND APARAPI HSA ENABLEMENT ROADMAP Application Application Application Application Aparapi Aparapi Aparapi JVM JVM JVM HSA-Enabled JVM IR HSA Runtime LLVM Optimizer HSAIL HSAIL HSAIL OpenCL™ HSA Finalizer HSA Finalizer HSA FinalizerCPU ISA GPU ISA CPU ISA GPU ISA CPU ISA GPU ISA CPU ISA GPU ISA CPU GPU HSA CPU HSA GPU HSA CPU HSA GPU HSA CPU HSA GPU27 | The Programmer’s Guide to a Universe of Possibility | June 12, 2012
  • HSA SOFTWARE STACKS28 | The Programmer’s Guide to a Universe of Possibility | June 12, 2012
  • INTRODUCING HSA BOLT – PARALLEL PRIMITIVES LIBRARY FOR HSA Easily leverage the inherent power efficiency of GPU computing  Common routines such as scan, sort, reduce, transform  More advanced routines like heterogeneous pipelines  Bolt library works with OpenCL or C++ AMP Enjoy the unique advantages of the HSA platform  Move the computation not the data Finally a single source code base for the CPU and GPU!  Developers can focus on core algorithms See Ben Sander’s session tomorrow for a deep dive on HSA Bolt!29 | The Programmer’s Guide to a Universe of Possibility | June 12, 2012
  • HSA SOLUTION STACK Application Domain Specific Libs Application SW (Bolt, OpenCV™, … many others) OpenCL™ DirectX Other Runtime Runtime Runtime HSA Runtime Legacy HSA Software Drivers HSAIL Ctl Drivers HSA Finalizer Knl Driver GPU ISA Other Differentiated HW CPU(s) GPU(s) Accelerators30 | The Programmer’s Guide to a Universe of Possibility | June 12, 2012
  • AMD’S OPEN SOURCE COMMITMENT TO HSA We will open source our linux execution and compilation stack  Jump start the ecosystem  Allow a single shared implementation where appropriate  Enable university research in all areas Component Name AMD Specific Rationale HSA Bolt Library No Enable understanding and debug OpenCL HSAIL Code Generator No Enable research LLVM Contributions No Industry and academic collaboration HSA Assembler No Enable understanding and debug HSA Runtime No Standardize on a single runtime HSA Finalizer Yes Enable research and debug HSA Kernel Driver Yes For inclusion in linux distros31 | The Programmer’s Guide to a Universe of Possibility | June 12, 2012
  • EASE OF PROGRAMMING CODE COMPLEXITY VS. PERFORMANCE32 | The Programmer’s Guide to a Universe of Possibility | June 12, 2012
  • LINES-OF-CODE AND PERFORMANCE FOR DIFFERENT PROGRAMMING MODELS (Exemplary ISV “Hessian” Kernel) 350 35.00 300 30.00 Init. 250 25.00 Launch Performance 200 Compile 20.00 LOC Compile Copy Copy 150 15.00 Launch Launch Launch Algorithm 100 Launch 10.00 Launch Algorithm Algorithm Algorithm Launch 50 5.00 Algorithm Algorithm Algorithm Copy-back Copy-back Copy-back 0 0 Serial CPU TBB Intrinsics+TBB OpenCL™-C OpenCL™ -C++ C++ AMP HSA Bolt Copy-back Algorithm Launch Copy Compile Init Performance AMD A10-5800K APU with Radeon™ HD Graphics – CPU: 4 cores, 3800MHz (4200MHz Turbo); GPU: AMD Radeon HD 7660D, 6 compute units, 800MHz; 4GB RAM. Software – Windows 7 Professional SP1 (64-bit OS); AMD OpenCL™ 1.2 AMD-APP (937.2); Microsoft Visual Studio 11 Beta33 | The Programmer’s Guide to a Universe of Possibility | June 12, 2012
  • THE HSA FUTURE Highly productive programmers + Scalable performance + Power efficiency = AMAZING USER EXPERIENCES34 | The Programmer’s Guide to a Universe of Possibility | June 12, 2012
  • ANNOUNCING…THE HSA FOUNDATIONPHILIP ROGERS, PRESIDENT
  • THE HSA FOUNDATION: ACTIVITIES Nonprofit, open standardization body for HSA platforms that will own the development and evangelization of the architecture going forward Make heterogeneous programming easy and a first-class pervasive complement to CPU computing Continue to increase the power efficiency of HSA, keeping it the platform of choice from smartphones to the cloud Bring to market strong development solutions (tools, libraries, OS runtimes) to drive innovative advanced content and applications Foster growth of heterogeneous computing talent through HSA developer training and academic programs to drive both learning and innovation© Copyright 2012 HSA Foundation. All Rights Reserved. 37
  • AMD’S CONTRIBUTION TO DATE HSA draft specifications  HSA Programmer Reference Manual  HSA Hardware System Architecture Specification  HSA Software System Architecture Specification Open source execution stack and compiler technology HSA Bolt library – standard template library Initial funding for incorporation© Copyright 2012 HSA Foundation. All Rights Reserved. 38
  • FOUNDATION CATEGORIES OF MEMBERS Founder Promoter Supporter Contributor Academic Associate© Copyright 2012 HSA Foundation. All Rights Reserved. 39
  • HSA FOUNDATION INITIAL FOUNDERS represented here by , ARM Fellow and VP of Technology, Media Processing
  • BRINGING VISUALCOMPUTING TO LIFE  ARM Fellow , VP of TechnologyJEM DAVIES Media Processing Division, ARM
  • ARM COMMITTED TO HETEROGENEOUS COMPUTING42 | The Programmer’s Guide to a Universe of Possibility | June 12, 2012
  • HSA FOUNDATION INITIAL FOUNDERS represented here by , ARM Fellow and VP of Technology, Media Processing represented here by , President, Imagination Technologies USA represented here by , President, MediaTek USA, Inc. represented here by , Director, Linux Development Center represented here by , CVP, Heterogeneous Applications and Developer Solutions
  • THE HSA FOUNDATION www.hsafoundation.com© Copyright 2012 HSA Foundation. All Rights Reserved. 44
  • Disclaimer & Attribution The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. There is no obligation to update or otherwise correct or revise this information. However, we reserve the right to revise this information and to make changes from time to time to the content hereof without obligation to notify any person of such revisions or changes. NO REPRESENTATIONS OR WARRANTIES ARE MADE WITH RESPECT TO THE CONTENTS HEREOF AND NO RESPONSIBILITY IS ASSUMED FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION. ALL IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE ARE EXPRESSLY DISCLAIMED. IN NO EVENT WILL ANY LIABILITY TO ANY PERSON BE INCURRED FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. AMD, the AMD arrow logo, the HSA logo, and combinations thereof are trademarks of Advanced Micro Devices, Inc. OpenCL™ is a trademark of Apple Corp. which is licensed to the Khronos Organization. All other names used in this presentation are for informational purposes only and may be trademarks of their respective owners. © 2012 Advanced Micro Devices, Inc.46 | The Programmer’s Guide to a Universe of Possibility | June 12, 2012