Your SlideShare is downloading. ×
Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by Mike Muller, Chief Technology Officer, ARM
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by Mike Muller, Chief Technology Officer, ARM

983

Published on

Keynote presentation, Is There Anything New in Heterogeneous Computing, by Mike Muller, Chief Technology Officer, ARM, at the AMD Developer Summit (APU13), Nov. 11-13, 2013.

Keynote presentation, Is There Anything New in Heterogeneous Computing, by Mike Muller, Chief Technology Officer, ARM, at the AMD Developer Summit (APU13), Nov. 11-13, 2013.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
983
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
37
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Mike Muller CTO Is there anything new in heterogeneous computing?
  • 2. Evolution Wearable Intelligence 13 Mobile Computing PC 82 89 93 07 10 IOT Embedded 77 97 Consumer Smart Appliances Computing Cloud Server 1960 1970 1980 1990 2000 2010 2020
  • 3. What’s the Innovation? Wireless 3G MEMS CCD Media Social Media? Semiconductor Process? GPS
  • 4. Mobility Trends: CMOS 10,000 cm2/(V·s) 1,000 100 10 1990 NMOS PMOS 1995 2000 2005 2010 2015 Planar CMOS 5nm HNW FinFET Strain 3.5nm 2020 2025 III-V GE NEMS HKMG Switches 7nm 14nm 10nm VNW spintronics 2D: C, MoS Graphene wire, CNT via Interconnect Al wires // 3DIC Opto I/O Opto int CU wires SADP Patterning LELE SAQP LELELE EUV Seq. 3D EUV + DWEB EUV LELE EUV + DSA
  • 5. Printing: Moore’s Law and Ink Jets Drops/Second 1/Size (pL-1) 1E11 1E1 10’s microns 1E10 100’s microns 1E9 1E0 1E8 1E7 1E-1 1E6 10,000 nozzles 1E5 1E-2 10 nozzles 1E4 1E3 1E-3 1980 1985 1990 1995 2000 2005 2010 2015 2020
  • 6. Printing and Imprinting Thin Film Transistors (TFT)  Can be transparent, bio-degradable and even ingestible  Unit cost 1000 less than mainstream CMOS    CMOS @ $40,000/m2 vs. TFT @ $10/m2 Printing CAPEX can be less than $1,000  350dpi = 200um @ 20 m/s  Can print batteries, antenna  Mainly organic at ~20 volts Imprint CAPEX a $2M DVD press is high volume  Better controllability hence higher density and performance  1um today scale to 50nm features as used today for BluRay discs  Mainly Inorganic NMOS only at ~2 volts
  • 7. Mobility Trends: CMOS & Thin Film Transistors 10000 1000 CPU cm2/(V·s) 100 10 1 0.1 ARM1 3µ 6MHz CortexM0 0.01 2µ 20kHz 0.001 0.0001 0.00001 1990 1995 2000 2005 2010 2015 Conventional NMOS Conventional PMOS TFT 2020 2025
  • 8. Top Right and Bottom Left
  • 9. Is There Anything New in Heterogeneous Computing? Vector Add Reduction Matrix Mul GPU OpenCL on GPU 1.00 1.00 1.00 GPU OpenCL on FPGA 0.14 0.02 0.89 FPGA OpenCL on FPGA 1.71 1.62 31.85 1998 Manual Partitioning C & Assembler ARM + DSP 2013 Manual Partitioning C++ & OpenCL/RenderScript ARM + GPU
  • 10. How Do People Program? ~20M Programmers Web Mobile Embedded ~200k Desktop  Simple, old-school ray tracer  Start with C++ code and accelerate the code with Heterogeneous Systems void traceScreen() { for(y = 0; y < height; ++y) { for(x = 0; x < width; ++x){ Ray ray = generateRay(x, y); IntersectableObject *obj = traceRay(ray); framebuffer[y][x] = colorPixelForObject(obj); } } } void traceScreen() { par_for_2D(height, width, [&](int y, int x) { Ray ray = generateRay(x, y); IntersectableObject *obj = traceRay(ray); framebuffer[y][x] = colorPixelForObject(obj); }); }
  • 11. Moving the Code onto OpenCL 1.x  Need to make the following changes a) b) c) d) e) f) g) Get rid of all the pointers, both in scene vector and internally in CSGObject Rewrite the use of std::vector, as OpenCL C does not understand C++ data type internals Get rid of the virtual function calls Change the classes to structs Get rid of recursion in CSGObject Avoid accessing the global scene variable in accelerated code Port the code base to OpenCL C
  • 12. Moving the Code onto OpenCL 2  Need to make the following changes a) b) c) d) e) f) g) Get rid of all the pointers, both in scene vector and internally in CSGObject Rewrite the use of std::vector, as OpenCL C does not understand C++ data type internals Get rid of the virtual function calls Change the classes to structs Get rid of recursion in CSGObject Avoid accessing the global scene variable in accelerated code Port the code base to OpenCL C  OpenCL 2 solves point a) with shared address space, but not the rest
  • 13. Moving the Code onto C++ AMP  Need to make the following changes a) b) c) d) e) f) g) Get rid of all the pointers, both in scene vector and internally in CSGObject Rewrite the use of std::vector, as C++ AMP cannot call into C++ standard library Get rid of the virtual function calls Change the classes to structs Get rid of recursion in CSGObject Avoid accessing the global scene variable in accelerated code Port the code base to OpenCL C  C++ AMP solves points d), f) and g), but not the rest
  • 14. Moving the Code onto HSA  Need to make the following changes a) b) c) d) e) f) g) Get rid of all the pointers, both in scene vector and internally in CSGObject Rewrite the use of std::vector, as HSAIL does not understand C++ data type internals Get rid of the virtual function calls Change the classes to structs Get rid of recursion in CSGObject Avoid accessing the global scene variable in accelerated code Port the code base to a language on top of HSAIL  HSA solves points a), c), d), e) and soon f)
  • 15. What Makes GPUs Good For Power Efficient Compute?  Relaxed single-threaded performance    No dynamic scheduling  No branch prediction  No register renaming, no result forwarding  Longer pipelines  Lower clock frequencies Multi-threading  Tolerate long latencies to memory Increasing the ALU/control ratio  Short-vectors exposed to programmers  SIMT/Warp/VLIW/Wavefront based execution
  • 16. .. Heterogeneous Compute Homogeneous Architecture big LITTLE  How about a SIMTish ARM?  Familiar programming model, C++ and OpenMP  Fewer seams  Sharing data structures and function pointers/vtables Integer Pipe FP Pipe Load/Store Pipe Write SIMT Queue RESEARCH Throughput
  • 17. Moving the Code onto a Warped ARM  Need to make the following changes        Get rid of all the pointers, both in scene vector and internally in CSGObject Rewrite the use of std::vector, as OpenCL C does not understand C++ data type internals Get rid of the virtual function calls Change the classes to structs Get rid of recursion in CSGObject Avoid accessing the global scene variable in accelerated code Port the code base to OpenCL C
  • 18. Performance vs Effort  We’ve implemented SGEMM, a matrix-matrix multiplication benchmark, in various ways, to investigate the tradeoff between programmer effort and performance payoff SGEMM version ARM in C Speedup Effort 1x Low ARM in C with NEON intrinsics, prefetching 15x Medium - High ARM in assembly with NEON, prefetching 26x High SIMTish ARM in C 35x Low SIMTish ARM in C, unrolled 44x Low - Medium Mali GPU x 4 way 136x High
  • 19. Scale Needs Standards
  • 20. Works for geeks… No proper orchestration Battle for the apps platform Needs home IT support Or only single manufacturer IPv4 Sonosnet IPv6 Imagine that there were a 1000 of these connected devices….
  • 21. Functional Becomes the Internet of things Functional Little Data
  • 22. Mike My Data X Gym X Life Insurance ! Their Data Car Insurance Rob Curtis Haymakers Cambridge Picture by Keith Jones
  • 23. Sharing Needs Trust
  • 24. IOT Medical Devices  First implantable Pacemaker 1958  Can a pacemaker be hacked to kill?  Or just a plot line in US TV series RF interface for adjusting settings   First hacked in 2008   “Sustained effort by a team of specialists” – The New York Times  Range a few cm Today  MIT grad students  One weekend  Range 50 feet
  • 25. Trust Needs Security
  • 26. It’s a Heterogeneous Future Reach The future Open Data and Objects Scale Needs Standards Sharing Needs Trust Trust Needs Security Applications Mobile internet Internet / broadband M2M SaaS Fixed Telephony Networks Smart Everything Sensors & Actuators Networks Today Mobile Telephony

×