Keynote presentation, Is There Anything New in Heterogeneous Computing, by Mike Muller, Chief Technology Officer, ARM, at the AMD Developer Summit (APU13), Nov. 11-13, 2013.
CC-4005, Performance analysis of 3D Finite Difference computational stencils ...AMD Developer Central
Presentation CC-4005, Performance analysis of 3D Finite Difference computational stencils on Seamicro fabric compute systems, by Joshua Mora from the AMD Developer Summit (APU13) November 2013.
HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...AMD Developer Central
Presentation HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated Processing Units, by Robert Engel at the AMD Developer Summit (APU13) Nov. 11-13, 2013.
MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...AMD Developer Central
Presentation MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabilities, by Srikanth Gollapudi at the AMD Developer Summit (APU13) November 11-13, 2013.
CC-4005, Performance analysis of 3D Finite Difference computational stencils ...AMD Developer Central
Presentation CC-4005, Performance analysis of 3D Finite Difference computational stencils on Seamicro fabric compute systems, by Joshua Mora from the AMD Developer Summit (APU13) November 2013.
HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated ...AMD Developer Central
Presentation HC-4021, Efficient scheduling of OpenMP and OpenCL™ workloads on Accelerated Processing Units, by Robert Engel at the AMD Developer Summit (APU13) Nov. 11-13, 2013.
MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabi...AMD Developer Central
Presentation MM-4092, Optimizing FFMPEG and Handbrake Using OpenCL and Other AMD HW Capabilities, by Srikanth Gollapudi at the AMD Developer Summit (APU13) November 11-13, 2013.
CC-4000, Characterizing APU Performance in HadoopCL on Heterogeneous Distribu...AMD Developer Central
Presentation CC-4000, Characterizing APU Performance in HadoopCL on Heterogeneous Distributed Platforms, by Max Grossman at the AMD Developer Summit (APU13) November 11-13, 2013.
This presentation describes the components of GPU ecosystem for compute, provides overview of existing ecosystems, and contains a case study on NVIDIA Nsight
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware WebinarAMD Developer Central
This deck presents highlights from the Introduction to OpenCL™ Programming Webinar presented by Acceleware & AMD on Sept. 17, 2014. Watch a replay of this popular webinar on the AMD Dev Central YouTube channel here: https://www.youtube.com/user/AMDDevCentral or here for the direct link: http://bit.ly/1r3DgfF
This presentation accompanies the webinar replay located here: http://bit.ly/1zmvlkL
AMD Media SDK Software Architect Mikhail Mironov shows you how to leverage an AMD platform for multimedia processing using the new Media Software Development Kit. He discusses how to use a new set of C++ interfaces for easy access to AMD hardware blocks, and shows you how to leverage the Media SDK in the development of video conferencing, wireless display, remote desktop, video editing, transcoding, and more.
AMD’s math libraries can support a range of programmers from hobbyists to ninja programmers. Kent Knox from AMD’s library team introduces you to OpenCL libraries for linear algebra, FFT, and BLAS, and shows you how to leverage the speed of OpenCL through the use of these libraries.
Review the material presented in the AMD Math libraries webinar in this deck.
For more:
Visit the AMD Developer Forums:http://devgurus.amd.com/welcome
Watch the replay: www.youtube.com/user/AMDDevCentral
Follow us on Twitter: https://twitter.com/AMDDevCentral
PT-4057, Automated CUDA-to-OpenCL™ Translation with CU2CL: What's Next?, by W...AMD Developer Central
Presentation PT-4057, Automated CUDA-to-OpenCL™ Translation with CU2CL: What's Next?, by Wu Feng and Mark Gardner at the AMD Developer Summit (APU13) November 11-13, 2013.
CC-4000, Characterizing APU Performance in HadoopCL on Heterogeneous Distribu...AMD Developer Central
Presentation CC-4000, Characterizing APU Performance in HadoopCL on Heterogeneous Distributed Platforms, by Max Grossman at the AMD Developer Summit (APU13) November 11-13, 2013.
This presentation describes the components of GPU ecosystem for compute, provides overview of existing ecosystems, and contains a case study on NVIDIA Nsight
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware WebinarAMD Developer Central
This deck presents highlights from the Introduction to OpenCL™ Programming Webinar presented by Acceleware & AMD on Sept. 17, 2014. Watch a replay of this popular webinar on the AMD Dev Central YouTube channel here: https://www.youtube.com/user/AMDDevCentral or here for the direct link: http://bit.ly/1r3DgfF
This presentation accompanies the webinar replay located here: http://bit.ly/1zmvlkL
AMD Media SDK Software Architect Mikhail Mironov shows you how to leverage an AMD platform for multimedia processing using the new Media Software Development Kit. He discusses how to use a new set of C++ interfaces for easy access to AMD hardware blocks, and shows you how to leverage the Media SDK in the development of video conferencing, wireless display, remote desktop, video editing, transcoding, and more.
AMD’s math libraries can support a range of programmers from hobbyists to ninja programmers. Kent Knox from AMD’s library team introduces you to OpenCL libraries for linear algebra, FFT, and BLAS, and shows you how to leverage the speed of OpenCL through the use of these libraries.
Review the material presented in the AMD Math libraries webinar in this deck.
For more:
Visit the AMD Developer Forums:http://devgurus.amd.com/welcome
Watch the replay: www.youtube.com/user/AMDDevCentral
Follow us on Twitter: https://twitter.com/AMDDevCentral
PT-4057, Automated CUDA-to-OpenCL™ Translation with CU2CL: What's Next?, by W...AMD Developer Central
Presentation PT-4057, Automated CUDA-to-OpenCL™ Translation with CU2CL: What's Next?, by Wu Feng and Mark Gardner at the AMD Developer Summit (APU13) November 11-13, 2013.
For the full video of this presentation, please visit:
https://www.edge-ai-vision.com/2021/02/introduction-to-the-tvm-open-source-deep-learning-compiler-stack-a-presentation-from-octoml/
Luis Ceze, Co-founder and CEO of OctoML, a Professor in the Paul G. Allen School of Computer Science and Engineering at the University of Washington, and Venture Partner at Madrona Venture Group, presents the “Introduction to the TVM Open Source Deep Learning Compiler Stack” tutorial at the September 2020 Embedded Vision Summit.
There is an increasing need to bring machine learning to a wide diversity of hardware devices. Current frameworks rely on vendor-specific operator libraries and optimize for a narrow range of server-class GPUs. Deploying workloads to new platforms — such as mobile phones, embedded devices, and accelerators — requires significant manual effort.
In this talk, Ceze presents his work on the TVM stack, which exposes graph- and operator-level optimizations to provide performance portability for deep learning workloads across diverse hardware back-ends. TVM solves optimization challenges specific to deep learning, such as high-level operator fusion, mapping to arbitrary hardware primitives and memory latency hiding. It also automates optimization of low-level programs to hardware characteristics by employing a novel, learning-based cost modeling method for rapid exploration of optimizations.
OrientDB uses Hazelcast to achieve a distributed configuration with multi-master replication. By using these together you can scale up horizontally by adding new servers without stopping or reconfigure the cluster. In this webinar, you’ll be introduced to OrientDB and how it compares to other NoSQL DBMS. You will also learn how and why Hazelcast is being used with OrientDB to achieve scale, elasticity and performance.
Both Hazelcast and Orient Technologies are providing professional open source support to their respective projects under the Apache software license.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Takes the reader through the various components of windowing systems, and how to develop and benchmark various Graphics applications using OpenGL and other toolsets. Also includes a Cheatsheet that covers various terminologies used in the Graphics world.
Edge optimized architecture for fabric defect detection in real-timeShuquan Huang
In textile industry, fabric defect relies on human inspection traditionally, which is inaccurate, inconsistent, inefficient and expensive. There were automatic systems developed on the defect detection by identifying the faults in fabric surface using the image and video processing techniques. However, the existing solution has insufficiencies in defect data sharing, backhaul interconnect, maintenance and etc. By evolving to an edge-optimized architecture, we can help textile industry improve fabric quality, reduce operation cost and increase production efficiency. In this session, I’ll share:
What’s edge computing and why it’s important to intelligence manufacturing
What’s the characteristics, strengths and weaknesses of traditional fabric defect detection method
Why textile industry can benefit from edge computing infrastructure
How to design and implement an edge-enabled application for fabric defect detection in real-time
Insights, synergy and future research directions
IMAGE CAPTURE, PROCESSING AND TRANSFER VIA ETHERNET UNDER CONTROL OF MATLAB G...Christopher Diamantopoulos
This implemented DSP system utilizes TCP socket communication. Upon message reception, it decides the appropriate process to be executed based on cases which can be categorized as follows:
1) image capture
2) image transfer
3) image processing
4) sensor calibration
A user-friendly MATLAB GUI, named DIPeth, facilitates the system's control.
Use C++ and Intel® Threading Building Blocks (Intel® TBB) for Hardware Progra...Intel® Software
In this presentation, we focus on an alternative approach that uses nodes that contain Intel® Xeon® processors and Intel® Xeon Phi™ coprocessors. Programming models and the development tools are identical for these resources, greatly simplifying development. We discuss how the same models for vectorization and threading can be used across these compute resources to create software that performs well on them. We further propose an extension to the Intel® Threading Building Blocks (Intel® TBB) flow graph interface that enables intra-node distributed memory programming, simplifying communication, and load balancing between the processors and coprocessors. Finally, we validate this approach by presenting a benchmark of a risk analysis implementation that achieves record-setting performance.
Scalability for All: Unreal Engine* 4 with Intel Intel® Software
Unreal Engine* 4 is a high-performance game engine for game developers. Learn how Intel and Epic Games* worked together to improve engine performance both for CPUs and GPUs and how developers can take advantage of it.
Vulkan and DirectX12 share many common concepts, but differ vastly from the APIs most game developers are used to. As a result, developing for DX12 or Vulkan requires a new approach to graphics programming and in many cases a redesign of the Game Engine. This lecture will teach the basic concepts common to Vulkan and DX12 and help developers overcome the main problems that often appear when switching to one of the new APIs. It will explain how those new concepts will help games utilize the hardware more efficiently and discuss best practices for game engine development.
For more, visit http://developer.amd.com/
This is the slide deck from the popular "Introduction to Node.js" webinar with AMD and DevelopIntelligence, presented by Joshua McNeese. Watch our AMD Developer Central YouTube channel for the replay at https://www.youtube.com/user/AMDDevCentral.
Learn more about DirectGMA in this blog post: bit.ly/AMDDirectGMA
AMD has introduced Direct Graphics Memory Access in order to:
‒ Makes a portion of the GPU memory accessible to other devices
‒ Allows devices on the bus to write directly into this area of GPU memory
‒ Allows GPUs to write directly into the memory of remote devices on the bus supporting DirectGMA
‒ Provides a driver interface to allow 3rd party hardware vendors to support data exchange with an AMD GPU using DirectGMA
‒ and more
View the accompanying blog post here: bit.ly/AMDDirectGMA
This Webinar explores a variety of new and updated features in Java 8, and discuss how these changes can positively impact your day-to-day programming.
Watch the video replay here: http://bit.ly/1vStxKN
Your Webinar presenter, Marnie Knue, is an instructor for Develop Intelligence and has taught Sun & Oracle certified Java classes, RedHat JBoss administration, Spring, and Hibernate. Marnie also has spoken at JavaOne.
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...AMD Developer Central
This presentation discusses the Mantle API, what it is, why choose it, and abstraction level, small batch performance and platform efficiency.
Download the presentation from the AMD Developer website here: http://bit.ly/TrEUeC
Inside XBox One by Martin Fuller from the Sweden Game Developers Conference, June 2, 2014, Stockholm, Sweden. View other presentations here: http://bit.ly/TrEUeC
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...AMD Developer Central
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Harris Gasparakis, AMD, at the Embedded Vision Alliance Summit, May 2014.
Harris Gasparakis, Ph.D., is AMD’s OpenCV manager. In addition to enhancing OpenCV with OpenCL acceleration, he is engaged in AMD’s Computer Vision strategic planning, ISVs, and AMD Ventures engagements, including technical leadership and oversight in the AMD Gesture product line. He holds a Ph.D. in theoretical high energy physics from YITP at SUNYSB. He is credited with enabling real-time volumetric visualization and analysis in Radiology Information Systems (Terarecon), including the first commercially available virtual colonoscopy system (Vital Images). He was responsible for cutting edge medical technology (Biosense Webster, Stereotaxis, Boston Scientific), incorporating image and signal processing with AI and robotic control.
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...AMD Developer Central
In this webinar presentation, ArrayFire COO Oded Green demonstrates best practices to help you quickly get started with OpenCL™ programming. Learn how to get the best performance from AMD hardware in various programming languages using ArrayFire. Oded discusses the latest advancements in the OpenCL™ ecosystem, including cutting edge OpenCL™ libraries such as clBLAS, clFFT, clMAGMA and ArrayFire. Examples are shown in real code for common application domains.
Watch the webinar here: http://bit.ly/1obT0M2
For more developer resources, visit:
http://arrayfire.com/
http://developer.amd.com/
Follow us on Twitter: https://twitter.com/AMDDevCentral
See info in the slides for more contact information and resource links!
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14AMD Developer Central
Johan Andersson will show how the Frostbite 3 game engine is using the low-level graphics API Mantle to deliver significantly improved performance in Battlefield 4 on PC and future games from Electronic Arts in this presentation from the 2014 Game Developers Conference in San Francisco March 17-21. Also view this and other presentations on our developer website at http://developer.amd.com/resources/documentation-articles/conference-presentations/
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14AMD Developer Central
Learn more about how AMD’s RapidFire SDK simplifies the delivery of multi-game streaming from a single GPU while minimizing latency to ensure one of the best cloud gaming experiences in this presentation from the 2014 Game Developers Conference in San Francisco March 17-21. Also view this and other presentations on our developer website at http://developer.amd.com/resources/documentation-articles/conference-presentations/
Mantle and Nitrous - Combining Efficient Engine Design with a modern API - AM...AMD Developer Central
Oxide Games Partners Dan Baker and Tim Kipp will show you how to build a high throughput renderer using the Mantle API in this AMD technology presentation from the 2014 Game Developers Conference in San Francisco March 17-21. Also view this and other presentations on our developer website at http://developer.amd.com/resources/documentation-articles/conference-presentations/
This AMD technology presentation from the 2014 Game Developers Conference in San Francisco March 17-21 explains how Mantle features can enable developers to improve both CPU and GPU performance in their titles. Also view this and other presentations at http://developer.amd.com/resources/documentation-articles/conference-presentations/
A look at how new Direct3D advancements enhance efficiency and enable fully-threaded building of command buffers in this prentation from the 2014 Game Developers Conference in San Francisco March 17-21. Also view this and other presentations on our developer website at http://developer.amd.com/resources/documentation-articles/conference-presentations/
Bill explains some of the ways that the Vertex Shader can be used to improve performance by taking a fast path through the Vertex Shader rather than generating vertices with other parts of the pipeline in this AMD technology presentation from the 2014 Game Developers Conference in San Francisco March 17-21. Check out more technical presentations at http://developer.amd.com/resources/documentation-articles/conference-presentations/
Let's dive deeper into the world of ODC! Ricardo Alves (OutSystems) will join us to tell all about the new Data Fabric. After that, Sezen de Bruijn (OutSystems) will get into the details on how to best design a sturdy architecture within ODC.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
6. Printing and Imprinting Thin Film Transistors (TFT)
Can be transparent, bio-degradable and even ingestible
Unit cost 1000 less than mainstream CMOS
CMOS @ $40,000/m2 vs. TFT @ $10/m2
Printing CAPEX can be less than $1,000
350dpi = 200um @ 20 m/s
Can print batteries, antenna
Mainly organic at ~20 volts
Imprint CAPEX a $2M DVD press is high volume
Better controllability hence higher density and performance
1um today scale to 50nm features as used today for BluRay discs
Mainly Inorganic NMOS only at ~2 volts
9. Is There Anything New in Heterogeneous Computing?
Vector Add
Reduction
Matrix Mul
GPU OpenCL on GPU
1.00
1.00
1.00
GPU OpenCL on FPGA
0.14
0.02
0.89
FPGA OpenCL on FPGA
1.71
1.62
31.85
1998
Manual Partitioning
C & Assembler
ARM
+
DSP
2013
Manual Partitioning
C++ & OpenCL/RenderScript
ARM
+
GPU
10. How Do People Program?
~20M Programmers
Web
Mobile
Embedded
~200k
Desktop
Simple, old-school ray tracer
Start with C++ code and accelerate the code with Heterogeneous Systems
void traceScreen()
{
for(y = 0; y < height; ++y) {
for(x = 0; x < width; ++x){
Ray ray = generateRay(x, y);
IntersectableObject *obj = traceRay(ray);
framebuffer[y][x] = colorPixelForObject(obj);
}
}
}
void traceScreen()
{
par_for_2D(height, width, [&](int y, int x) {
Ray ray = generateRay(x, y);
IntersectableObject *obj = traceRay(ray);
framebuffer[y][x] = colorPixelForObject(obj);
});
}
11. Moving the Code onto OpenCL 1.x
Need to make the following changes
a)
b)
c)
d)
e)
f)
g)
Get rid of all the pointers, both in scene vector and internally in CSGObject
Rewrite the use of std::vector, as OpenCL C does not understand C++ data type internals
Get rid of the virtual function calls
Change the classes to structs
Get rid of recursion in CSGObject
Avoid accessing the global scene variable in accelerated code
Port the code base to OpenCL C
12. Moving the Code onto OpenCL 2
Need to make the following changes
a)
b)
c)
d)
e)
f)
g)
Get rid of all the pointers, both in scene vector and internally in CSGObject
Rewrite the use of std::vector, as OpenCL C does not understand C++ data type internals
Get rid of the virtual function calls
Change the classes to structs
Get rid of recursion in CSGObject
Avoid accessing the global scene variable in accelerated code
Port the code base to OpenCL C
OpenCL 2 solves point a) with shared address space, but not the rest
13. Moving the Code onto C++ AMP
Need to make the following changes
a)
b)
c)
d)
e)
f)
g)
Get rid of all the pointers, both in scene vector and internally in CSGObject
Rewrite the use of std::vector, as C++ AMP cannot call into C++ standard library
Get rid of the virtual function calls
Change the classes to structs
Get rid of recursion in CSGObject
Avoid accessing the global scene variable in accelerated code
Port the code base to OpenCL C
C++ AMP solves points d), f) and g), but not the rest
14. Moving the Code onto HSA
Need to make the following changes
a)
b)
c)
d)
e)
f)
g)
Get rid of all the pointers, both in scene vector and internally in CSGObject
Rewrite the use of std::vector, as HSAIL does not understand C++ data type internals
Get rid of the virtual function calls
Change the classes to structs
Get rid of recursion in CSGObject
Avoid accessing the global scene variable in accelerated code
Port the code base to a language on top of HSAIL
HSA solves points a), c), d), e) and soon f)
15. What Makes GPUs Good For Power Efficient Compute?
Relaxed single-threaded performance
No dynamic scheduling
No branch prediction
No register renaming, no result forwarding
Longer pipelines
Lower clock frequencies
Multi-threading
Tolerate long latencies to memory
Increasing the ALU/control ratio
Short-vectors exposed to programmers
SIMT/Warp/VLIW/Wavefront based execution
16. ..
Heterogeneous Compute Homogeneous Architecture
big
LITTLE
How about a SIMTish ARM?
Familiar programming model, C++ and OpenMP
Fewer seams
Sharing data structures and function pointers/vtables
Integer Pipe
FP Pipe
Load/Store Pipe
Write
SIMT
Queue
RESEARCH
Throughput
17. Moving the Code onto a Warped ARM
Need to make the following changes
Get rid of all the pointers, both in scene vector and internally in CSGObject
Rewrite the use of std::vector, as OpenCL C does not understand C++ data type internals
Get rid of the virtual function calls
Change the classes to structs
Get rid of recursion in CSGObject
Avoid accessing the global scene variable in accelerated code
Port the code base to OpenCL C
18. Performance vs Effort
We’ve implemented SGEMM, a matrix-matrix multiplication benchmark, in various
ways, to investigate the tradeoff between programmer effort and performance payoff
SGEMM version
ARM in C
Speedup
Effort
1x
Low
ARM in C with NEON intrinsics, prefetching
15x
Medium - High
ARM in assembly with NEON, prefetching
26x
High
SIMTish ARM in C
35x
Low
SIMTish ARM in C, unrolled
44x
Low - Medium
Mali GPU x 4 way
136x
High
20. Works for geeks…
No proper orchestration
Battle for the apps platform
Needs home IT support
Or only single manufacturer
IPv4
Sonosnet
IPv6
Imagine that there
were a 1000 of these
connected devices….
24. IOT Medical Devices
First implantable Pacemaker 1958
Can a pacemaker be hacked to kill?
Or just a plot line in US TV series
RF interface for adjusting settings
First hacked in 2008
“Sustained effort by a team of specialists” – The New York Times
Range a few cm
Today
MIT grad students
One weekend
Range 50 feet
26. It’s a Heterogeneous Future
Reach
The future
Open Data
and Objects
Scale Needs Standards
Sharing Needs Trust
Trust Needs Security
Applications
Mobile internet
Internet / broadband
M2M
SaaS
Fixed Telephony Networks
Smart
Everything
Sensors & Actuators
Networks
Today
Mobile Telephony