0
THE PROGRAMMER’S GUIDE TO REACHING FOR THE CLOUD
PHIL ROGERS, CORPORATE FELLOW, AMD
NOV. 11, 2013
MODERN CLOUD WORKLOADS ARE HETEROGENEOUS
SCALAR CONTENT WITH A GROWING MIX OF PARALLEL CONTENT

 Video is expected to rep...
FUTURE TECHNOLOGY GROWTH WILL ACCELERATE THE TREND
 Rapid growth of Sensor Networks

RAPID GROWTH OF THE NUMBER OF THINGS...
HSA APU PROCESSORS OPERATE HARMONIOUSLY AT LOW POWER
EXAMPLE: VIDEO ENHANCEMENT

 Techniques include:
‒ Image Stabilizati...
HETEROGENEOUS PROCESSORS - EVERYWHERE
SMARTPHONES TO SUPER-COMPUTERS

Super computer
Dense Server
Tablet
Phone

Workstatio...
HOW DOES HSA MAKE THIS ALL WORK?
 Enables acceleration of languages like Java, C++ AMP and Python
 All processors use th...
HSA in 2013
HSA FOUNDATION AT LAUNCH
BORN IN JUNE 2012

Founders

9 | AMD DEVELOPER SUMMIT | NOVEMBER 2013
HSA FOUNDATION TODAY – NOVEMBER 2013
A GROWING AND POWERFUL FAMILY

Founders
Promoters
Supporters
Contributors

TBA at APU...
HSA FOUNDATION PROGRESS
WHAT AN AMAZING FIRST YEAR

 Membership growing rapidly
‒ 2-3 new members per month
‒ Universitie...
PROGRAMMING LANGUAGES PROLIFERATING ON HSA
OpenCL™
App

Java App

C++ AMP
App

Python
App

OpenCL
Runtime

Java JVM
(Sumat...
Workloads
HIGH EFFICIENCY VIDEO CODEC – HEVC (H.265)
VALUE PROPOSITION

HEVC VISUAL QUALITY IS
SIGNIFICANTLY BETTER THAN
H.264 AT AN...
HIGH EFFICIENCY VIDEO CODEC – HEVC (H.265)
WHY HEVC WILL PROLIFERATE
 The next generation MPEG video encoding standard
 ...
HEVC (H.265) ACCELERATION
EFFICIENT CLOUD DEPLOYMENT

ALL STAGES OF HEVC ARE
ACCELERATED ON THE APU






Decrypt
Dec...
OVERVIEW OF B+ TREES
 B+ Trees are a special case of B Trees

 A B+ Tree …
‒ is a dynamic, multi-level index
‒ Is effici...
APPLICATIONS THAT USE B/B+ TREES

primary data store on the clientside

multi-data center key-value store

Mail, Safari, i...
HOW WE ACCELERATE
 Utilize coarse-grained parallelism in B+ Tree searches
‒ Perform many queries in parallel
‒ Increase m...
RESULTS
1M search queries in parallel
7

 Input B+ Tree contains 112 million
keys and uses 6GB of memory

 Software: Ope...
REVERSE TIME MIGRATION (RTM)

Land crews

 A technique for creating images based on
sensor data to improve seismic interp...
TEXT ANALYTICS – HADOOP TERASORT AND BIG DATA SEARCH
MINING BIG DATA
 Multi-stage pipeline or parallel
processing stages
...
Programming
Languages
PROGRAMMING MODELS EMBRACING HSAIL AND HSA
THE RIGHT LEVEL OF ABSTRACTION

UNDER DEVELOPMENT





Java: Project Sumatr...
HSA ENABLES DEVELOPERS TO LEVERAGE HC … EASILY & NATURALLY

PREFERRED PROGRAMMING
LANGUAGES

TRANSPARENT CALLS TO POPULAR
...
C++ AMP ACCELERATION GOES MULTI-PLATFORM
 Herb Sutter Announced C++ AMP for the Windows® Platform at ADS 2011
 We very m...
HSA ENABLEMENT OF JAVA
JAVA 7 – OpenCL ENABLED APARAPI

JAVA 8 – HSA ENABLED APARAPI

JAVA 9 – HSA ENABLED JAVA (SUMATRA)
...
JAVA DEMO

WELCOME GARY FROST TO THE STAGE

28 | AMD DEVELOPER SUMMIT | NOVEMBER 2013
NBODY REVISTED
 NBody problem:
‒ Calculate the position of ‘N’ bodies in 3D space by computing the gravitational effect e...
JAVA 8’S ‘PROJECT LAMBDA’ SIMPLIFIES PARALLEL PROGRAMMING
 Offers an alternate syntax for processing arrays/collections o...
JAVA DEMO

31 | AMD DEVELOPER SUMMIT | NOVEMBER 2013
JAVA AND THE CLOUD

THE RIGHT LANGUAGE WITH ACCELERATION ON CLOUD APUS

 Java 8 and Java 9 provide parallel acceleration
...
Programming Tools
ANNOUNCING AMD’S UNIFIED SDK
 Access to AMD APU and GPU programmable
components
 Component installer - choose just what ...
ANNOUNCING AMD

V1.3

 AMD’s comprehensive heterogeneous
developer tool suite including:
‒ CPU and GPU Profiling
‒ GPU ke...
OPEN SOURCE LIBRARIES ACCELERATED BY AMD

OpenCV

Bolt

clMath

Aparapi

 Most popular computer
vision library

 C++ tem...
AMD APUS, HSA – CLIENT TO THE CLOUD
A CONVERGENCE AT THE RIGHT TIME

 Parallel workloads are booming
‒ Acceleration where...
A SPECIAL GUEST

Gary Campbell

Infrastructure Technology Strategy CTO
HP

38 | AMD DEVELOPER SUMMIT | NOVEMBER 2013
DISCLAIMER & ATTRIBUTION

The information presented in this document is for informational purposes only and may contain te...
GARY CAMPBELL
INFRASTRUCTURE TECHNOLOGY STRATEGY CTO
HP
MOONSHOT SERVER CARTRIDGE WITH AMD
FUTURE AVAILABILITY

MOONSHOT SERVER CARTRIDGE
WITH AMD
* Future availability

Cartridg...
AMD + HP MOONSHOT = BEST SOLUTION FOR HOSTED DESKTOPS

• Built on HP Moonshot technology for 45% of remote desktop market
...
HP INVESTING IN INNOVATION ACROSS THE ECOSYSTEM

OFFERING THE RESOURCES AND SCALE TO HELP DESIGNERS REACH MAINSTREAM MARKE...
Keynote (Phil Rogers) - The Programmers Guide to Reaching for the Cloud - by Phil Rogers, AMD Corporate Fellow, AMD
Keynote (Phil Rogers) - The Programmers Guide to Reaching for the Cloud - by Phil Rogers, AMD Corporate Fellow, AMD
Upcoming SlideShare
Loading in...5
×

Keynote (Phil Rogers) - The Programmers Guide to Reaching for the Cloud - by Phil Rogers, AMD Corporate Fellow, AMD

1,048

Published on

Keynote presentation, The Programmers Guide to Reaching for the Cloud, by Phil Rogers, AMD Corporate Fellow, AMD, at the AMD Developer Summit (APU13), Nov. 11-13, 2013.

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,048
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
30
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Transcript of "Keynote (Phil Rogers) - The Programmers Guide to Reaching for the Cloud - by Phil Rogers, AMD Corporate Fellow, AMD"

  1. 1. THE PROGRAMMER’S GUIDE TO REACHING FOR THE CLOUD PHIL ROGERS, CORPORATE FELLOW, AMD NOV. 11, 2013
  2. 2. MODERN CLOUD WORKLOADS ARE HETEROGENEOUS SCALAR CONTENT WITH A GROWING MIX OF PARALLEL CONTENT  Video is expected to represent two thirds of mobile data traffic by 2017 ‒ Video is continuously being captured, uploaded, transcoded and streamed ‒ Video processing is inherently parallel … and can be accelerated  Big data growing exponentially with Exabytes of data crawled monthly ‒ Indexing the web and extracting high definition information ‒ Map reduce is a heterogeneous workload  Natural User Interfaces are still in their infancy ‒ Accurate extraction of meaning from gesture and voice ‒ Getting to the fingertips and voice inflections NEED TO SIMULTANEOUSLY INCREASE PERFORMANCE AND REDUCE POWER 3 | AMD DEVELOPER SUMMIT | NOVEMBER 2013
  3. 3. FUTURE TECHNOLOGY GROWTH WILL ACCELERATE THE TREND  Rapid growth of Sensor Networks RAPID GROWTH OF THE NUMBER OF THINGS CONNECTED TO THE INTERNET ‒ Drives exponential increase in data  Internet of Everything (IoE) results in explosion of data sources “Fixed” Computing (you go to the device) Mobility / BYOD (the device goes with you) Internet of Things (age of devices) HOW MUCH VALUE IS AT STAKE IN THE IOE ECONOMY? Internet of Everything (people, process, data, things) $14.4 trillion 50B ‒ Another exponential growth in data at local and cloud level  Context Aware Computing is a Huge Big-Data Problem $9.5 $4.9 trillion ‒ Both local and cloud compute must get faster/lower power 1995 2000 2005 2010 2015 2020 trillion From industry-specific use cases From cross-industry use cases DRIVING FUTURE DEMAND FOR LOCAL AND CLOUD PARALLEL EFFICIENCY 4 | AMD DEVELOPER SUMMIT | NOVEMBER 2013 Source: Cisco IBSG, 2013
  4. 4. HSA APU PROCESSORS OPERATE HARMONIOUSLY AT LOW POWER EXAMPLE: VIDEO ENHANCEMENT  Techniques include: ‒ Image Stabilization, Super Resolution, Deblur, Deinterlace, Lighting & Contrast  Enhancements examine pixels from a large number of video frames ‒ Super-resolution based on information from surrounding frames  Algorithms can be run on multiple processors in the APU ‒ CPU, GPU, DSPs, Fixed Function Accelerators ‒ Convolutions, motion estimation, histograms, format conversions, etc. ‒ Processing flows freely between processors for best efficiency 5 | AMD DEVELOPER SUMMIT | NOVEMBER 2013
  5. 5. HETEROGENEOUS PROCESSORS - EVERYWHERE SMARTPHONES TO SUPER-COMPUTERS Super computer Dense Server Tablet Phone Workstation Notebook A SINGLE SCALABLE ARCHITECTURE FOR THE WORLD’S PROGRAMMERS IS DEMANDED AT THIS POINT 6 | AMD DEVELOPER SUMMIT | NOVEMBER 2013
  6. 6. HOW DOES HSA MAKE THIS ALL WORK?  Enables acceleration of languages like Java, C++ AMP and Python  All processors use the same addresses, and can share data structures in place  Heterogeneous computing can use all of virtual and physical memory  Extends multicore coherency to the GPU and other processors  Pass work quickly between the processors  Enables quality of service HSA FOUNDATION – BUILDING THE ECOSYSTEM 7 | AMD DEVELOPER SUMMIT | NOVEMBER 2013
  7. 7. HSA in 2013
  8. 8. HSA FOUNDATION AT LAUNCH BORN IN JUNE 2012 Founders 9 | AMD DEVELOPER SUMMIT | NOVEMBER 2013
  9. 9. HSA FOUNDATION TODAY – NOVEMBER 2013 A GROWING AND POWERFUL FAMILY Founders Promoters Supporters Contributors TBA at APU-13 Universities 10 | AMD DEVELOPER SUMMIT | NOVEMBER 2013 NTHU Programming Language Lab NTHU System Software Lab COMPUTER SCIENCE
  10. 10. HSA FOUNDATION PROGRESS WHAT AN AMAZING FIRST YEAR  Membership growing rapidly ‒ 2-3 new members per month ‒ Universities enrolling  Four working groups generating specifications ‒ HSA Programmers Reference Manual published ‒ HSA System Architecture spec going to ratification by the end of the year ‒ Runtime WG and Tools WG will publish early next year  HSA Development platforms to ship in early 2014 11 | AMD DEVELOPER SUMMIT | NOVEMBER 2013
  11. 11. PROGRAMMING LANGUAGES PROLIFERATING ON HSA OpenCL™ App Java App C++ AMP App Python App OpenCL Runtime Java JVM (Sumatra) Various Runtimes Fabric Engine RT HSAIL HSA Helper Libraries HSA Core Runtime Kernel Fusion Driver (KFD) 12 | AMD DEVELOPER SUMMIT | NOVEMBER 2013 HSA Finalizer
  12. 12. Workloads
  13. 13. HIGH EFFICIENCY VIDEO CODEC – HEVC (H.265) VALUE PROPOSITION HEVC VISUAL QUALITY IS SIGNIFICANTLY BETTER THAN H.264 AT ANY GIVEN BIT RATE 30% TO 50% MORE EFFICIENT THAN H.264 AT 1080P RESOLUTION 4K Ultra HDTV Sony XBR $4999 H.265 @ 500 kbps H.264 @ 500 kbps 14 | AMD DEVELOPER SUMMIT | NOVEMBER 2013 4K VIDEO BENEFITS ARE EVEN MORE SIGNIFICANT WITH HEVC 30% to 50% 4K Video Cameras GoPro $399
  14. 14. HIGH EFFICIENCY VIDEO CODEC – HEVC (H.265) WHY HEVC WILL PROLIFERATE  The next generation MPEG video encoding standard  Significantly higher efficiency (up to 50% lower bit rates at given quality) than AVC (H.264)  Highly beneficial for HD video (1080p or below)  Especially beneficial for 4K video  Scales to 8K Ultra High Definition video (up to 8192×4320)  Computationally complex, but by design easier to parallelize than H.264 Traffic Share Mobile Video Mobile M2M Exabytes Per Month 12 Mobile Web/Data Mobile File Sharing 3.5% 5.1% 10 24.9% 8 6 4 66.5% 2 CLOUD VIDEO PROVIDERS NEED THE HIGHER COMPRESSION FOR QUALITY OF SERVICE 15 | AMD DEVELOPER SUMMIT | NOVEMBER 2013 0 2012 2013 2014 2015 2016 2017 Source: Cisco VNI Mobile Forecast, 2013
  15. 15. HEVC (H.265) ACCELERATION EFFICIENT CLOUD DEPLOYMENT ALL STAGES OF HEVC ARE ACCELERATED ON THE APU      Decrypt Decode and decompress Scaling and Enhancement Encode and compress Encrypt 16 | AMD DEVELOPER SUMMIT | NOVEMBER 2013 ENCODE IS THE HEAVIEST STAGE H.265 ENCODING IS 5 – 10X MORE COMPUTATIONALLY COMPLEX THAN H.264  Leverage point for compression  Highly parallel  Algorithms improve monthly  Must stay programmable  Picture can be divided into Macroblock regions with a much wider range of sizes and shapes  Motion vectors have 33 prediction directions compared to 8 for H.264
  16. 16. OVERVIEW OF B+ TREES  B+ Trees are a special case of B Trees  A B+ Tree … ‒ is a dynamic, multi-level index ‒ Is efficient for retrieval of data, stored in a block-oriented context  Fundamental data structure used in several popular database management systems ‒ SQLite ‒ CouchDB  Order (b) of a B+ Tree measures the capacity of its nodes 3 2 5 4 6 7 1 2 3 4 5 6 7 8 d1 d2 d3 d4 d5 d6 d7 d8 17 | AMD DEVELOPER SUMMIT | NOVEMBER 2013
  17. 17. APPLICATIONS THAT USE B/B+ TREES primary data store on the clientside multi-data center key-value store Mail, Safari, iPhone, iPod, iTunes market-data framework Firefox and Thunderbird large hadron collider Android, Chrome 18 | AMD DEVELOPER SUMMIT | NOVEMBER 2013 http://www.sqlite.org/famous.html http://wiki.apache.org/couchdb/CouchDB_in_the_wild
  18. 18. HOW WE ACCELERATE  Utilize coarse-grained parallelism in B+ Tree searches ‒ Perform many queries in parallel ‒ Increase memory bandwidth utilization with parallel reads ‒ Increase throughput (transactions per second for OLTP)  B+ Tree searches on an HSA enabled APU ‒ Allows much larger B+ Trees to be searched, than traditional GPU compute ‒ Eliminates data-copies since CPU and GPU cores can access the same memory 19 | AMD DEVELOPER SUMMIT | NOVEMBER 2013
  19. 19. RESULTS 1M search queries in parallel 7  Input B+ Tree contains 112 million keys and uses 6GB of memory  Software: OpenCL on HSA 5 Speedup  Hardware: AMD “Kaveri” APU with Quad Core CPU and 8 GCN Compute Units at 35W TDP 6 4 3 2 1 0 8 16 32 64 128 Order of B+ Tree Baseline: 4-core OpenMP + hand-tuned SSE CPU implementation 20 | AMD DEVELOPER SUMMIT | NOVEMBER 2013 Results measured in AMD Labs on “Kaveri” APU, 35W TDP, 16GB DRAM
  20. 20. REVERSE TIME MIGRATION (RTM) Land crews  A technique for creating images based on sensor data to improve seismic interpretations done by geophysicists Marine crews  A memory-intensive and highly parallel algorithm  RTM is run on massive data sets  A natural scale out algorithm  Often run today on 100K node CPU systems  Bringing this to HSA and APU based supercomputing will increase performance for current sensor arrays, and allow more sensors and accuracy in the future. 21 | AMD DEVELOPER SUMMIT | NOVEMBER 2013 HOWEVER, SPEED OF PROCESSING AND INTERPRETATION IS A CRITICAL BOTTLENECK IN MAKING FULL USE OF ACQUISITION ASSETS
  21. 21. TEXT ANALYTICS – HADOOP TERASORT AND BIG DATA SEARCH MINING BIG DATA  Multi-stage pipeline or parallel processing stages  Traditional GPU Compute is challenged by copies Input HDFS sort split 0 map Sort Compression Regular expression parsing CRC generation  Acceleration of large data search scales out across the cluster of APU nodes 22 | AMD DEVELOPER SUMMIT | NOVEMBER 2013 Output HDFS merge reduce split 1 split 2 part 0 HDFS Replication reduce  APU with HSA accelerates each stage in place ‒ ‒ ‒ ‒ copy part 1 HDFS Replication map map
  22. 22. Programming Languages
  23. 23. PROGRAMMING MODELS EMBRACING HSAIL AND HSA THE RIGHT LEVEL OF ABSTRACTION UNDER DEVELOPMENT     Java: Project Sumatra OpenJDK 9 OpenMP from SuSE C++ AMP, based on CLANG/LLVM Python and KL from Fabric Engine 24 | AMD DEVELOPER SUMMIT | NOVEMBER 2013 NEXT      DSLs: Halide, Julia, Rust Fortran JavaScript Open Shading Language R
  24. 24. HSA ENABLES DEVELOPERS TO LEVERAGE HC … EASILY & NATURALLY PREFERRED PROGRAMMING LANGUAGES TRANSPARENT CALLS TO POPULAR LIBRARIES  Java, C++, OpenMP, Python *  OpenCV, SciPy, NumPy, ImageMagick, Bolt, …  SVM, Coherence, GPU Enqueue  OpenJDK/Sumatra, Fabric Engine  Arbitrary data structures, SVM, Coherence, User mode queueing  OpenCV API, Bolt STL library * Java 8, C++ AMP, OpenMP 4.0 next generation standards and extensions 25 | AMD DEVELOPER SUMMIT | NOVEMBER 2013 USING CONVENTIONAL METHODS  Arbitrary data structures, malloc, function pointers, callbacks, recursion, semaphores, atomics  SVM, Coherence, User-mode queueing, GPU Enqueue, HSAIL  Linked-list/tree traversal + other complex shared host data structures
  25. 25. C++ AMP ACCELERATION GOES MULTI-PLATFORM  Herb Sutter Announced C++ AMP for the Windows® Platform at ADS 2011  We very much liked the single source model of development, and decided to extend it to be multi-platform  Today we are announcing C++ AMP is moving beyond Microsoft® Windows to embrace Linux. We will offer this acceleration on both our APUs and our discrete GPUs  We are also bringing Bolt STL Library support to C++ AMP C++AMP 26 | AMD DEVELOPER SUMMIT | NOVEMBER 2013 CLANG Front-end LLVM-IR or SPIR 1.2 Any HSA Implementation SPIR 1.2 AVAILABLE IN OPEN SOURCE 1H-2014 HSAIL Any OpenCL™+SPIR Implementation LLVM Compiler
  26. 26. HSA ENABLEMENT OF JAVA JAVA 7 – OpenCL ENABLED APARAPI JAVA 8 – HSA ENABLED APARAPI JAVA 9 – HSA ENABLED JAVA (SUMATRA)  AMD initiated Open Source project  Java 8 brings Stream + Lambda API.  Adds native GPU acceleration to Java Virtual Machine (JVM)  APIs for data parallel algorithms ‒ GPU accelerate Java applications ‒ No need to learn OpenCL™  Active community captured mindshare ‒ ~20 contributors ‒ >7000 downloads ‒ ~150 visits per day ‒ More natural way of expressing data parallel algorithms ‒ Initially targeted at multi-core. We will provide HSA Enabled Aparapi on Java 8  APARAPI will : to bridge between Aparapi on Java 7 ‒ Support Java 8 Lambdas ‒ Dispatch code to HSA enabled devices at 9 and HSA/Sumatra on Java runtime via HSAIL Java Application  Developer uses JDK Lambda, Stream API  JVM uses GRAAL compiler to generate HSAIL  JVM decides at runtime to execute on either CPU or GPU depending on workload characteristics. Java Application Java Application Java JDK Stream + Lambda API APARAPI API APARAPI + Lambda API OpenCL™ OpenCL™ Compiler & Runtime CPU HSAIL HSA Finalizer & Runtime JVM CPU ISA Java GRAAL JIT backend HSAIL HSA Finalizer & Runtime JVM GPU ISA GPU 27 | AMD DEVELOPER SUMMIT | NOVEMBER 2013 CPU ISA CPU JVM GPU ISA GPU CPU ISA CPU GPU ISA GPU
  27. 27. JAVA DEMO WELCOME GARY FROST TO THE STAGE 28 | AMD DEVELOPER SUMMIT | NOVEMBER 2013
  28. 28. NBODY REVISTED  NBody problem: ‒ Calculate the position of ‘N’ bodies in 3D space by computing the gravitational effect each has on all of the others and updating it’s position.  A Java sequential NBody implementation would start with an Object for each Body. public class Body{ // State of object private float x, y, z, m, vx, vy, vz; // Method to update position relative to other bodies void updatePosition(Body[] bodies){ /* code omitted */ } }  Then we would iterate over all bodies updating the position of each for (Body b: bodies) { b.updatePosition(bodies) });  A pre Java 8 Java ‘parallel’ version would not fit so nicely on this slide ;) 29 | AMD DEVELOPER SUMMIT | NOVEMBER 2013
  29. 29. JAVA 8’S ‘PROJECT LAMBDA’ SIMPLIFIES PARALLEL PROGRAMMING  Offers an alternate syntax for processing arrays/collections of data for (Body b; bodies) b -> updatePosition(bodies); Arrays.stream(bodies) // wrap array in a stream .forEach(b -> b.updatePosition(bodies);  To process a stream in parallel we just tag the stream with the parallel() modifier Arrays.stream(bodies) // Wrap an array in a stream .parallel(); // tag the stream as parallel .forEach(b -> b.updatePosition(bodies);  In Java 8 a parallel stream executes across all CPU cores.  In Java 9 (Sumatra) a parallel stream executes across all CPU and GPU cores 30 | AMD DEVELOPER SUMMIT | NOVEMBER 2013
  30. 30. JAVA DEMO 31 | AMD DEVELOPER SUMMIT | NOVEMBER 2013
  31. 31. JAVA AND THE CLOUD THE RIGHT LANGUAGE WITH ACCELERATION ON CLOUD APUS  Java 8 and Java 9 provide parallel acceleration  Parallel workloads are proliferating in the cloud  Hadoop framework for scale out  HSA APUs provide workload acceleration DON’T MISS THE KEYNOTE TOMORROW FROM ORACLE’S NANDINI RAMANI “THE ROLE OF JAVA™ IN HETEROGENEOUS COMPUTING, AND HOW YOU CAN HELP” 32 | AMD DEVELOPER SUMMIT | NOVEMBER 2013
  32. 32. Programming Tools
  33. 33. ANNOUNCING AMD’S UNIFIED SDK  Access to AMD APU and GPU programmable components  Component installer - choose just what you need  Initial release includes: ‒ APP SDK v2.9 ‒ Media SDK 1.0 Beta AMD Unified SDK APP SDK 2.9 MEDIA SDK 1.0 BETA  Web-based sample browser  GPU accelerated video pre/post processing library  Supports programming standards: OpenCL™, C++ AMP  Leverage AMD's media encode/decode acceleration blocks  Code samples for accelerated open source libraries:  Library for low latency video encoding ‒ OpenCV, OpenNI, Bolt, Aparapi  OpenCL™ source editing plug-in for visual studio  Now supports Cmake 34 | AMD DEVELOPER SUMMIT | NOVEMBER 2013  Supports both Windows Store and Classic desktop
  34. 34. ANNOUNCING AMD V1.3  AMD’s comprehensive heterogeneous developer tool suite including: ‒ CPU and GPU Profiling ‒ GPU kernel Debugging ‒ GPU kernel analysis  New features in version 1.3: ‒ Supports Java ‒ Integrated static kernel analysis ‒ Remote debugging/profiling ‒ Supports latest AMD APU and GPU products CPU PROFILER GPU PROFILER GPU DEBUGGER STATIC KERNEL ANALYZER  Time-based profiling  OpenCL™ Application Trace  Analyze call-chain relationships  Profile OpenCL kernels  Compile, analyze and disassemble OpenCL Kernels  Java profiling with inline function support  Timeline visualization of GPU counter data  Real-time OpenCL kernel debugging with stepping and variable display  Cache-line utilization profiling  Kernel Occupancy Viewer  Supports latest AMD processors  Remote GPU Profiling 35 | AMD DEVELOPER SUMMIT | NOVEMBER 2013  OpenCL and OpenGL API Statistics  Object visualization  Remote GPU debugging  View kernel compilation errors/warnings  Estimate kernel performance  View generated ISA code  View registers
  35. 35. OPEN SOURCE LIBRARIES ACCELERATED BY AMD OpenCV Bolt clMath Aparapi  Most popular computer vision library  C++ template library  AMD released APPML as open source to create clMath  OpenCL™ accelerated Java 7  Now with many OpenCL™ accelerated functions  Provides GPU off-load for common data-parallel algorithms  Now with cross-OS support and improved performance/functionality 36 | AMD DEVELOPER SUMMIT | NOVEMBER 2013  Accelerated BLAS and FFT libraries  Accessible from Fortran, C and C++  Java APIs for data parallel algorithms (no need to learn OpenCL™
  36. 36. AMD APUS, HSA – CLIENT TO THE CLOUD A CONVERGENCE AT THE RIGHT TIME  Parallel workloads are booming ‒ Acceleration where the data is ‒ On the client for a snappy user experience ‒ In the cloud for scalable services  HSA enabled APUs in the cloud ‒ Big data analytics ‒ Video processing ‒ Science, imaging, genomics ‒ Unleashing the Java development community  Acceleration at all tiers of the cloud ‒ Data centers, media hubs, cloud periphery 37 | AMD DEVELOPER SUMMIT | NOVEMBER 2013
  37. 37. A SPECIAL GUEST Gary Campbell Infrastructure Technology Strategy CTO HP 38 | AMD DEVELOPER SUMMIT | NOVEMBER 2013
  38. 38. DISCLAIMER & ATTRIBUTION The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes. AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION. AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. ATTRIBUTION © 2013 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices, Inc. in the United States and/or other jurisdictions. OpenCL is a trademark of Apple Inc. and Microsoft and Windows are trademarks of Microsoft Corp. Other names are for informational purposes only and may be trademarks of their respective owners. 39 | AMD DEVELOPER SUMMIT | NOVEMBER 2013
  39. 39. GARY CAMPBELL INFRASTRUCTURE TECHNOLOGY STRATEGY CTO HP
  40. 40. MOONSHOT SERVER CARTRIDGE WITH AMD FUTURE AVAILABILITY MOONSHOT SERVER CARTRIDGE WITH AMD * Future availability Cartridge config • • • • 4 x Quad-core 1.5 GHz, 8 x 1GbE NICs 4 x 8GB Memory 32GB iSSD per SOC Chassis config • • • • • • 41 | AMD DEVELOPER SUMMIT | NOVEMBER 2013 45 AMD Opteron X2150 cartridges Dual 180 x 1GbE switch modules Dual 40GbE uplink modules 4 x 1500 watt platinum PS (n+1) Chassis management module 5 Dual-rotor, hot plug fans (N+1)
  41. 41. AMD + HP MOONSHOT = BEST SOLUTION FOR HOSTED DESKTOPS • Built on HP Moonshot technology for 45% of remote desktop market • Dedicated CPU and GPU support for 180 users in a single chassis • Predictable cost, scaling, and performance with pre-determined sizing SIMPLIFIED DEPLOYMENT 90% Up to faster deployment* CONSISTENT USER PERFORMANCE 6x faster graphics Up to frames per second* REDUCED TCO 44% better TCO & 12% less power* Up to * Based on HP internal estimates compared to traditional desktops 42 | AMD DEVELOPER SUMMIT | NOVEMBER 2013
  42. 42. HP INVESTING IN INNOVATION ACROSS THE ECOSYSTEM OFFERING THE RESOURCES AND SCALE TO HELP DESIGNERS REACH MAINSTREAM MARKETS HP Pathfinder Innovation Ecosystem Moonshot Concierge Support Select technology partnerships with the industry’s “best of the best” innovators Discovery Labs in U.S., France, China and Singapore plus HP expertise and services 3x Leading Technology Partnerships Solution Builder Program Faster time to innovation $ HP Discovery Lab Service & Consulting Acquire on your terms Watch Discovery Lab video http://www.youtube.com/watch?v=ZuO-zcmjvgw Email Discovery Lab (hpdiscovery.lab@hp.com) to find out more 43 | AMD DEVELOPER SUMMIT | NOVEMBER 2013
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×