SlideShare a Scribd company logo
1 of 64
Download to read offline
Software Libraries
for CUDA & OpenCL
Heterogeneous Computing is Hard
Two Examples:
1. Median Filtering

2. Local Windowing
Median Filtering
Increasingly
Difficult
Local Windowing
 Best algorithm to use changes given which
device is in the system.
Device 1

Device 2

Device 3

Device 4

Algorithm 1

395 ms

599

244

102

Algorihm 2

270

703

241

103

Algorithm 3

699

407

138

116

Algorithm 4

380

522

202

98
Why Software Libraries Are Great
 Reduce many lines of code to one line
 Obsessively tuned by experts; faster than DIY

 Well-tested and maintained
 Continuously improving
Five Influencers (besides price)

Performance

Portability

Programmability

Scalability

Community
Performance & Programmability

Faster

Slower

SSE or
AVX
Timeconsuming

Easy-to-use
Performance & Programmability

Faster

Writing
Kernels

Slower

SSE or
AVX
Timeconsuming

Easy-to-use
Performance & Programmability

Faster

Writing
Kernels

Slower

SSE or
AVX

Compiler
Directives

Timeconsuming

Easy-to-use
Performance & Programmability

Faster

Writing
Kernels

Using
Libraries

Slower

SSE or
AVX

Compiler
Directives

Timeconsuming

Easy-to-use
Performance
Performance
Portability
 Flavors of portability
 HW vendor options
 Accelerator options (GPU, coprocessor, FPGA)
 CPU fallback
 High-performance mobile computing

 Libraries can provide portability
Scalability
 Always start with one device
 Potential headaches of adding devices
 Performance hit
 Development complexity

 Libraries can make scaling easy
Community
 What do you do when bugs arise?
 Continuous refinement

 Someone to answer questions
 Libraries can have great community support
TIME

TIME

Benefits of Using a Library

Porting

Libraries eliminate
hidden costs of software
development

Maintenance
Documentation
Test and QA
Development

COST

Pain

COST

Pleasure
ArrayFire: Technical Computing
Performance & Programmability
 Super easy to program
 Highly optimized
Portability
Scalability
 Multi-GPU is 1-line of code
array *y = new array[n];
for (int i = 0; i < n; ++i) {
deviceset(i);
// change GPUs
array x = randu(5,5);
// add work to GPU’s queue
y[i] = fft(x);
// more work in queue
}
// all GPUs are now computing simultaneously
Community
 Over 8,000 posts at
http://forums.accelereyes.com
 Nightly library update releases
 Stable releases a few times a year
 v2.0 coming at the end of summer
Example Case Studies 1

17X

20X

20X

45X

12X

Neuro-imaging

Viral Analyses

Video Processing

Radar Imaging

Medical Devices

Georgia Tech

CDC

Google

System Planning

Spencer Tech
Example Case Studies 2

5X

35X

17X

70X

35X

Weather Models

Power Eng

Surveillance

Drug Delivery

Bioinformatics

NCAR

IIT India

BAE Systems

Georgia Tech

Leibnitz
Hundreds of Functions
reductions
• sum, min, max, count,
prod
• vectors, columns,
rows, etc

dense linear algebra
• LU, QR, Cholesky, SVD,
Eigenvalues, Inversion,
Solvers, Determinant,
Matrix Power

convolutions
• 2D, 3D, ND
FFTs
• 2D, 3D, ND

image processing
• filter, rotate, erode,
dilate, morph,
resize, rgb2gray,
histograms

interpolate & scale
• vectors, matrices
• rescaling
sorting
• along any
dimension
• sort detection

and many more…
Intuitive Functions (estimate π)
#include <stdio.h>
#include <arrayfire.h>
using namespace af;
int main() {
// 20 million random samples
int n = 20e6;
array x = randu(n,1), y = randu(n,1);
// how many fell inside unit circle?
float pi = 4 * sum<float>(x*x + y*y < 1) / n;
printf("pi = %gn", pi);
return 0;
}
array x = randu(n, f32);
array y = randu(n, f64);
array z = randu(n, u32);

Data Types

c32

complex
single precision

f64
real
double precision

b8

boolean byte

array

f32

real
single precision

container object

s32

u32

signed integer

unsigned integer

c64
complex
double precision
ND Support
vectors

matrices

volumes

… ND
Subscripting
ArrayFire Keywords: end, span
A(1,1)

A(1,span)

A(end,1)

A(end,span)

A(span,span,2)
Generate Arrays

constant(0,3)
constant(1,3,2,f64)
randu(1,8)
randn(2,2)
identity(3,3)
randu(5,7,c32)

//
//
//
//
//
//

3-by-1 column of zeros, single-precision
3-by-2 matrix, double-precision
row vector (1x8) of random values (uniform)
square matrix (2x2) random values (normal)
3-by-3 identity
complex random values
Create Arrays from CPU Data

float hA[] = {0,1,2,3,4,5};
array A(2,3,hA); // 2x3 matrix, single-precision
print(A);
// A = [ 0 2 4 ]
//
[ 1 3 5 ]

Note: Fortran storage order
Arithmetic

array R = randu(3,3);
array C = constant(1,3,3) + complex(sin(R));
// rescale complex values to unit circle
array a = randn(5,c32);
print(a / abs(a));

// C is c32
L-2 Norm Example

// calculate L-2 norm of
sqrt(sum(pow(X, 2)))
sqrt(sum(pow(X, 2), 0))
sqrt(sum(pow(X, 2), 1))

every column
// norm of every column vector
// ..same
// norm of every row vector
Subscripting Examples

array A = randu(3,3);
array a1 = A(0);
//
array a2 = A(0,1); //
A(1,span);
//
A.row(end);
//
A.cols(1,end);
//

first element
first row, second column
second row
last row
all but first column
Subscripting Examples

float b_ptr[] = {0,1,2,3,4,5,6,7,8,9};
array b(1,10,b_ptr);
b(seq(3));
// {0,1,2}
b(seq(1,7));
// {1,2,3,4,5,6,7}
b(seq(1,2,7));
// {1,3,5,7}
b(seq(0,2,end)); // {0,2,4,6,8}
Data Manipulation

// setting entries to a constant
A(span) = 4;
// fill entire array
A.row(0) = -1;
// first row
A(seq(3)) = 3.1415; // first three elements
Data Manipulation

// copy in another matrix
array B = constant(1,4,4,f64);
B.row(0) = randu(1,4,f32); // set row (upcast)
Data Manipulation

// index with another array
float h_inds[] = {0, 4, 2, 1}; // zero-based
array inds(1,4,h_inds);
B(inds) = randu(4,1); // set to random
Linear Algebra

// matrix factorization
array L, U;
lu(L, U, randu(n,n));
// linear systems: A x = b
array A = randu(n,n), b = randu(n,1);
array x = solve(A,b);
Graphics Functions
 asynchronous
 non-blocking

 throttled at 35 Hz
Graphics Functions
 non-blocking primitives
 surface - surface plotting (2d data)
 image - intensity image visualization
 arrows - vector fields
 plot2 - line plotting (x,y)
 plot3 - scatter plot (x,y,z)
 volume - volume rendering for 3d data
Graphics Functions
 utility commands









keep_on
keep_off
subfigure
palette
clearfig
draw (blocking)
figure
title
close
Graphics Example
#include <arrayfire.h>
using namespace af;
int main() {
// random 3d surface
const int n = 256;
while (1) {
array x = randu(n,n);
// 3d surface plot
surface(x);
}
return 0;
}
GFOR Parallel Loops
Parallel matrix multiplications (1 kernel launch)
gfor (array i, 3)
C(span,span,i) = A(span,span,i) * B;

=

C(,,1)

*

A(,,1)

=

B

C(,,2)

=

*

A(,,2)

B

C(,,3)

*

A(,,3)

B
GFOR Parallel Loops
Parallel matrix multiplications (1 kernel launch)
gfor (array i, 3)
C(span,span,i) = A(span,span,i) * B;

=
C(,,1:3)

=

*
=
A(,,1:3)

*

*
B
GFOR Parallel Loops
Parallel matrix multiplications (1 kernel launch)
gfor (array i, 3)
C(span,span,i) = A(span,span,i) * B;

=

C

*

A

B
Four Quick Stories in Conclusion
Advertising

Healthcare

Finance

Oil & Gas
Virtual Glasses Try-On
Acceleration Demands
 The CPU code
 45 seconds for one session to complete
 Highly optimized OpenMP code leveraging all cores

 1,000 sessions/minute required 750 CPU nodes

 Convert Mac-only research code to C#
 Focus on efficiently developed robust performance
ArrayFire Solution
 Linear algebra
 Matrix multiple, Transpose
 Linear solvers

 Image processing






Convolutions
Fast Fourier Transform
Correlation Filter
Sobel Filter
Gaussian Blur

 OpenCV functions
 Custom edge detection

 Graphics
 Rendering points

 Reductions
 Min, Max, Sum

 JIT
 Increased productivity
Results
 3X acceleration
 Dropped from 750 nodes,
to 250 nodes
 Benefit from ongoing
library support
Culture-Free Microbiology
Computercontrolled
pipettes
 Filling

 Filled
Microscope
 A computer-controlled microscope scans a
cassette of pipettes, changes imaging
modes, and acquires digital images
according to program
Acceleration Demands
 This platform provides a rapid alternative to
traditional cell culturing for susceptibility testing
 The faster the analysis pipeline, the sooner a
patient can be diagnosed and treated with an
antibiotic
 Culture-based methods can take 2-3 days, which
is problematic for many critically ill patients
ArrayFire Solution
 Image Processing
 Heavily filter based

 Convolve, Filter, Resize

 Image Statistics
 Mean, StdDev, Variance
Results
 Realtime throughput
Kernel

Speedup

Image Registration (Heavy use of
statistics functions)

73.17x

Custom Filter (Prep Center Image) 26.48x

Gaussian Blur

2.19x
Hedge Protection System
Acceleration Demands
 CPU-only version was taking 115 hours
 Needs to run entire database of portfolios
each night before trading begins next day
ArrayFire Solution
 Statistics Functions
 Random number
generation
 Variance

 Exponentials

 Arithmetic
 Sqrt

 Element-wise math

 Reductions
 Sum
Results
 GPU version drops runtime to 7 hours and
meets the requirement to run overnight
 Time left over to try more permutations
Oil Well Monitoring
 Ordinary telecom
fiber used as an
efficient, high fidelity
acoustic sensor
 Threaded along the
length of oil well
Acceleration Demands
 Require realtime signal processing from 24
channels per unit with an onsite server
 CPU-only solution was 5x slower than realtime
ArrayFire Solution
 Heavy usage of signal filtering functions
 FIR

 IIR
Results
 6x performance improvements in signal
processing
 20x overall performance improvement
through more efficiently vectorized code
Software Shop for CUDA & OpenCL
 Two ways to work with us:
 Use

 Hire our CUDA & OpenCL developers
Code development; CUDA & OpenCL training

More Related Content

What's hot

GPU and Deep learning best practices
GPU and Deep learning best practicesGPU and Deep learning best practices
GPU and Deep learning best practicesLior Sidi
 
Gpu with cuda architecture
Gpu with cuda architectureGpu with cuda architecture
Gpu with cuda architectureDhaval Kaneria
 
Easy and High Performance GPU Programming for Java Programmers
Easy and High Performance GPU Programming for Java ProgrammersEasy and High Performance GPU Programming for Java Programmers
Easy and High Performance GPU Programming for Java ProgrammersKazuaki Ishizaki
 
Applying of the NVIDIA CUDA to the video processing in the task of the roundw...
Applying of the NVIDIA CUDA to the video processing in the task of the roundw...Applying of the NVIDIA CUDA to the video processing in the task of the roundw...
Applying of the NVIDIA CUDA to the video processing in the task of the roundw...Ural-PDC
 
GPGPU programming with CUDA
GPGPU programming with CUDAGPGPU programming with CUDA
GPGPU programming with CUDASavith Satheesh
 
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...AMD Developer Central
 
PT-4057, Automated CUDA-to-OpenCL™ Translation with CU2CL: What's Next?, by W...
PT-4057, Automated CUDA-to-OpenCL™ Translation with CU2CL: What's Next?, by W...PT-4057, Automated CUDA-to-OpenCL™ Translation with CU2CL: What's Next?, by W...
PT-4057, Automated CUDA-to-OpenCL™ Translation with CU2CL: What's Next?, by W...AMD Developer Central
 
HC-4012, Complex Network Clustering Using GPU-based Parallel Non-negative Mat...
HC-4012, Complex Network Clustering Using GPU-based Parallel Non-negative Mat...HC-4012, Complex Network Clustering Using GPU-based Parallel Non-negative Mat...
HC-4012, Complex Network Clustering Using GPU-based Parallel Non-negative Mat...AMD Developer Central
 
PL-4048, Adapting languages for parallel processing on GPUs, by Neil Henning
PL-4048, Adapting languages for parallel processing on GPUs, by Neil HenningPL-4048, Adapting languages for parallel processing on GPUs, by Neil Henning
PL-4048, Adapting languages for parallel processing on GPUs, by Neil HenningAMD Developer Central
 
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14AMD Developer Central
 
Introduction to parallel computing using CUDA
Introduction to parallel computing using CUDAIntroduction to parallel computing using CUDA
Introduction to parallel computing using CUDAMartin Peniak
 
PG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl Hillesland
PG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl HilleslandPG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl Hillesland
PG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl HilleslandAMD Developer Central
 
How data rules the world: Telemetry in Battlefield Heroes
How data rules the world: Telemetry in Battlefield HeroesHow data rules the world: Telemetry in Battlefield Heroes
How data rules the world: Telemetry in Battlefield HeroesElectronic Arts / DICE
 
OpenGL 4.4 - Scene Rendering Techniques
OpenGL 4.4 - Scene Rendering TechniquesOpenGL 4.4 - Scene Rendering Techniques
OpenGL 4.4 - Scene Rendering TechniquesNarann29
 

What's hot (20)

GPU and Deep learning best practices
GPU and Deep learning best practicesGPU and Deep learning best practices
GPU and Deep learning best practices
 
Gpu with cuda architecture
Gpu with cuda architectureGpu with cuda architecture
Gpu with cuda architecture
 
Easy and High Performance GPU Programming for Java Programmers
Easy and High Performance GPU Programming for Java ProgrammersEasy and High Performance GPU Programming for Java Programmers
Easy and High Performance GPU Programming for Java Programmers
 
Can FPGAs Compete with GPUs?
Can FPGAs Compete with GPUs?Can FPGAs Compete with GPUs?
Can FPGAs Compete with GPUs?
 
Applying of the NVIDIA CUDA to the video processing in the task of the roundw...
Applying of the NVIDIA CUDA to the video processing in the task of the roundw...Applying of the NVIDIA CUDA to the video processing in the task of the roundw...
Applying of the NVIDIA CUDA to the video processing in the task of the roundw...
 
GPGPU programming with CUDA
GPGPU programming with CUDAGPGPU programming with CUDA
GPGPU programming with CUDA
 
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
 
PT-4057, Automated CUDA-to-OpenCL™ Translation with CU2CL: What's Next?, by W...
PT-4057, Automated CUDA-to-OpenCL™ Translation with CU2CL: What's Next?, by W...PT-4057, Automated CUDA-to-OpenCL™ Translation with CU2CL: What's Next?, by W...
PT-4057, Automated CUDA-to-OpenCL™ Translation with CU2CL: What's Next?, by W...
 
HC-4012, Complex Network Clustering Using GPU-based Parallel Non-negative Mat...
HC-4012, Complex Network Clustering Using GPU-based Parallel Non-negative Mat...HC-4012, Complex Network Clustering Using GPU-based Parallel Non-negative Mat...
HC-4012, Complex Network Clustering Using GPU-based Parallel Non-negative Mat...
 
Cuda
CudaCuda
Cuda
 
PL-4048, Adapting languages for parallel processing on GPUs, by Neil Henning
PL-4048, Adapting languages for parallel processing on GPUs, by Neil HenningPL-4048, Adapting languages for parallel processing on GPUs, by Neil Henning
PL-4048, Adapting languages for parallel processing on GPUs, by Neil Henning
 
Cuda
CudaCuda
Cuda
 
GPU Programming
GPU ProgrammingGPU Programming
GPU Programming
 
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
Vertex Shader Tricks by Bill Bilodeau - AMD at GDC14
 
Introduction to parallel computing using CUDA
Introduction to parallel computing using CUDAIntroduction to parallel computing using CUDA
Introduction to parallel computing using CUDA
 
CUDA
CUDACUDA
CUDA
 
PG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl Hillesland
PG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl HilleslandPG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl Hillesland
PG-4034, Using OpenGL and DirectX for Heterogeneous Compute, by Karl Hillesland
 
Low-level Graphics APIs
Low-level Graphics APIsLow-level Graphics APIs
Low-level Graphics APIs
 
How data rules the world: Telemetry in Battlefield Heroes
How data rules the world: Telemetry in Battlefield HeroesHow data rules the world: Telemetry in Battlefield Heroes
How data rules the world: Telemetry in Battlefield Heroes
 
OpenGL 4.4 - Scene Rendering Techniques
OpenGL 4.4 - Scene Rendering TechniquesOpenGL 4.4 - Scene Rendering Techniques
OpenGL 4.4 - Scene Rendering Techniques
 

Similar to PT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos

Introduction to Deep Learning and Tensorflow
Introduction to Deep Learning and TensorflowIntroduction to Deep Learning and Tensorflow
Introduction to Deep Learning and TensorflowOswald Campesato
 
D3 D10 Unleashed New Features And Effects
D3 D10 Unleashed   New Features And EffectsD3 D10 Unleashed   New Features And Effects
D3 D10 Unleashed New Features And EffectsThomas Goddard
 
Alerting mechanism and algorithms introduction
Alerting mechanism and algorithms introductionAlerting mechanism and algorithms introduction
Alerting mechanism and algorithms introductionFEG
 
Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017Gabriel Moreira
 
Large volume data analysis on the Typesafe Reactive Platform
Large volume data analysis on the Typesafe Reactive PlatformLarge volume data analysis on the Typesafe Reactive Platform
Large volume data analysis on the Typesafe Reactive PlatformMartin Zapletal
 
FPGA Implementation of a GA
FPGA Implementation of a GAFPGA Implementation of a GA
FPGA Implementation of a GAHocine Merabti
 
NeuralProcessingofGeneralPurposeApproximatePrograms
NeuralProcessingofGeneralPurposeApproximateProgramsNeuralProcessingofGeneralPurposeApproximatePrograms
NeuralProcessingofGeneralPurposeApproximateProgramsMohid Nabil
 
Towards neuralprocessingofgeneralpurposeapproximateprograms
Towards neuralprocessingofgeneralpurposeapproximateprogramsTowards neuralprocessingofgeneralpurposeapproximateprograms
Towards neuralprocessingofgeneralpurposeapproximateprogramsParidha Saxena
 
Two methods for optimising cognitive model parameters
Two methods for optimising cognitive model parametersTwo methods for optimising cognitive model parameters
Two methods for optimising cognitive model parametersUniversity of Huddersfield
 
Haskell for data science
Haskell for data scienceHaskell for data science
Haskell for data scienceJohn Cant
 
Approximation Data Structures for Streaming Applications
Approximation Data Structures for Streaming ApplicationsApproximation Data Structures for Streaming Applications
Approximation Data Structures for Streaming ApplicationsDebasish Ghosh
 
Exploration of Supervised Machine Learning Techniques for Runtime Selection o...
Exploration of Supervised Machine Learning Techniques for Runtime Selection o...Exploration of Supervised Machine Learning Techniques for Runtime Selection o...
Exploration of Supervised Machine Learning Techniques for Runtime Selection o...Akihiro Hayashi
 
Diving into Deep Learning (Silicon Valley Code Camp 2017)
Diving into Deep Learning (Silicon Valley Code Camp 2017)Diving into Deep Learning (Silicon Valley Code Camp 2017)
Diving into Deep Learning (Silicon Valley Code Camp 2017)Oswald Campesato
 
Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningCastLabKAIST
 
Machine learning at Scale with Apache Spark
Machine learning at Scale with Apache SparkMachine learning at Scale with Apache Spark
Machine learning at Scale with Apache SparkMartin Zapletal
 
Hairong Qi V Swaminathan
Hairong Qi V SwaminathanHairong Qi V Swaminathan
Hairong Qi V SwaminathanFNian
 
Feature Engineering - Getting most out of data for predictive models
Feature Engineering - Getting most out of data for predictive modelsFeature Engineering - Getting most out of data for predictive models
Feature Engineering - Getting most out of data for predictive modelsGabriel Moreira
 

Similar to PT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos (20)

Introduction to Deep Learning and Tensorflow
Introduction to Deep Learning and TensorflowIntroduction to Deep Learning and Tensorflow
Introduction to Deep Learning and Tensorflow
 
D3 D10 Unleashed New Features And Effects
D3 D10 Unleashed   New Features And EffectsD3 D10 Unleashed   New Features And Effects
D3 D10 Unleashed New Features And Effects
 
Alerting mechanism and algorithms introduction
Alerting mechanism and algorithms introductionAlerting mechanism and algorithms introduction
Alerting mechanism and algorithms introduction
 
Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017
 
Large volume data analysis on the Typesafe Reactive Platform
Large volume data analysis on the Typesafe Reactive PlatformLarge volume data analysis on the Typesafe Reactive Platform
Large volume data analysis on the Typesafe Reactive Platform
 
FPGA Implementation of a GA
FPGA Implementation of a GAFPGA Implementation of a GA
FPGA Implementation of a GA
 
Support Vector Machines ( SVM )
Support Vector Machines ( SVM ) Support Vector Machines ( SVM )
Support Vector Machines ( SVM )
 
NeuralProcessingofGeneralPurposeApproximatePrograms
NeuralProcessingofGeneralPurposeApproximateProgramsNeuralProcessingofGeneralPurposeApproximatePrograms
NeuralProcessingofGeneralPurposeApproximatePrograms
 
Towards neuralprocessingofgeneralpurposeapproximateprograms
Towards neuralprocessingofgeneralpurposeapproximateprogramsTowards neuralprocessingofgeneralpurposeapproximateprograms
Towards neuralprocessingofgeneralpurposeapproximateprograms
 
AI and Deep Learning
AI and Deep Learning AI and Deep Learning
AI and Deep Learning
 
Two methods for optimising cognitive model parameters
Two methods for optimising cognitive model parametersTwo methods for optimising cognitive model parameters
Two methods for optimising cognitive model parameters
 
Haskell for data science
Haskell for data scienceHaskell for data science
Haskell for data science
 
Approximation Data Structures for Streaming Applications
Approximation Data Structures for Streaming ApplicationsApproximation Data Structures for Streaming Applications
Approximation Data Structures for Streaming Applications
 
Exploration of Supervised Machine Learning Techniques for Runtime Selection o...
Exploration of Supervised Machine Learning Techniques for Runtime Selection o...Exploration of Supervised Machine Learning Techniques for Runtime Selection o...
Exploration of Supervised Machine Learning Techniques for Runtime Selection o...
 
Dpdk applications
Dpdk applicationsDpdk applications
Dpdk applications
 
Diving into Deep Learning (Silicon Valley Code Camp 2017)
Diving into Deep Learning (Silicon Valley Code Camp 2017)Diving into Deep Learning (Silicon Valley Code Camp 2017)
Diving into Deep Learning (Silicon Valley Code Camp 2017)
 
Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine Learning
 
Machine learning at Scale with Apache Spark
Machine learning at Scale with Apache SparkMachine learning at Scale with Apache Spark
Machine learning at Scale with Apache Spark
 
Hairong Qi V Swaminathan
Hairong Qi V SwaminathanHairong Qi V Swaminathan
Hairong Qi V Swaminathan
 
Feature Engineering - Getting most out of data for predictive models
Feature Engineering - Getting most out of data for predictive modelsFeature Engineering - Getting most out of data for predictive models
Feature Engineering - Getting most out of data for predictive models
 

More from AMD Developer Central

DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIsDX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIsAMD Developer Central
 
Leverage the Speed of OpenCL™ with AMD Math Libraries
Leverage the Speed of OpenCL™ with AMD Math LibrariesLeverage the Speed of OpenCL™ with AMD Math Libraries
Leverage the Speed of OpenCL™ with AMD Math LibrariesAMD Developer Central
 
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware WebinarAn Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware WebinarAMD Developer Central
 
Webinar: Whats New in Java 8 with Develop Intelligence
Webinar: Whats New in Java 8 with Develop IntelligenceWebinar: Whats New in Java 8 with Develop Intelligence
Webinar: Whats New in Java 8 with Develop IntelligenceAMD Developer Central
 
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...AMD Developer Central
 
TressFX The Fast and The Furry by Nicolas Thibieroz
TressFX The Fast and The Furry by Nicolas ThibierozTressFX The Fast and The Furry by Nicolas Thibieroz
TressFX The Fast and The Furry by Nicolas ThibierozAMD Developer Central
 
Rendering Battlefield 4 with Mantle by Yuriy ODonnell
Rendering Battlefield 4 with Mantle by Yuriy ODonnellRendering Battlefield 4 with Mantle by Yuriy ODonnell
Rendering Battlefield 4 with Mantle by Yuriy ODonnellAMD Developer Central
 
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil PerssonLow-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil PerssonAMD Developer Central
 
Direct3D12 and the Future of Graphics APIs by Dave Oldcorn
Direct3D12 and the Future of Graphics APIs by Dave OldcornDirect3D12 and the Future of Graphics APIs by Dave Oldcorn
Direct3D12 and the Future of Graphics APIs by Dave OldcornAMD Developer Central
 
Introduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan NevraevIntroduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan NevraevAMD Developer Central
 
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth ThomasHoly smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth ThomasAMD Developer Central
 
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...AMD Developer Central
 
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14AMD Developer Central
 
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14AMD Developer Central
 

More from AMD Developer Central (20)

DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIsDX12 & Vulkan: Dawn of a New Generation of Graphics APIs
DX12 & Vulkan: Dawn of a New Generation of Graphics APIs
 
Leverage the Speed of OpenCL™ with AMD Math Libraries
Leverage the Speed of OpenCL™ with AMD Math LibrariesLeverage the Speed of OpenCL™ with AMD Math Libraries
Leverage the Speed of OpenCL™ with AMD Math Libraries
 
Introduction to Node.js
Introduction to Node.jsIntroduction to Node.js
Introduction to Node.js
 
Media SDK Webinar 2014
Media SDK Webinar 2014Media SDK Webinar 2014
Media SDK Webinar 2014
 
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware WebinarAn Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
 
DirectGMA on AMD’S FirePro™ GPUS
DirectGMA on AMD’S  FirePro™ GPUSDirectGMA on AMD’S  FirePro™ GPUS
DirectGMA on AMD’S FirePro™ GPUS
 
Webinar: Whats New in Java 8 with Develop Intelligence
Webinar: Whats New in Java 8 with Develop IntelligenceWebinar: Whats New in Java 8 with Develop Intelligence
Webinar: Whats New in Java 8 with Develop Intelligence
 
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
The Small Batch (and other) solutions in Mantle API, by Guennadi Riguer, Mant...
 
Inside XBox- One, by Martin Fuller
Inside XBox- One, by Martin FullerInside XBox- One, by Martin Fuller
Inside XBox- One, by Martin Fuller
 
TressFX The Fast and The Furry by Nicolas Thibieroz
TressFX The Fast and The Furry by Nicolas ThibierozTressFX The Fast and The Furry by Nicolas Thibieroz
TressFX The Fast and The Furry by Nicolas Thibieroz
 
Rendering Battlefield 4 with Mantle by Yuriy ODonnell
Rendering Battlefield 4 with Mantle by Yuriy ODonnellRendering Battlefield 4 with Mantle by Yuriy ODonnell
Rendering Battlefield 4 with Mantle by Yuriy ODonnell
 
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil PerssonLow-level Shader Optimization for Next-Gen and DX11 by Emil Persson
Low-level Shader Optimization for Next-Gen and DX11 by Emil Persson
 
Gcn performance ftw by stephan hodes
Gcn performance ftw by stephan hodesGcn performance ftw by stephan hodes
Gcn performance ftw by stephan hodes
 
Inside XBOX ONE by Martin Fuller
Inside XBOX ONE by Martin FullerInside XBOX ONE by Martin Fuller
Inside XBOX ONE by Martin Fuller
 
Direct3D12 and the Future of Graphics APIs by Dave Oldcorn
Direct3D12 and the Future of Graphics APIs by Dave OldcornDirect3D12 and the Future of Graphics APIs by Dave Oldcorn
Direct3D12 and the Future of Graphics APIs by Dave Oldcorn
 
Introduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan NevraevIntroduction to Direct 3D 12 by Ivan Nevraev
Introduction to Direct 3D 12 by Ivan Nevraev
 
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth ThomasHoly smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
Holy smoke! Faster Particle Rendering using Direct Compute by Gareth Thomas
 
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...
 
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
Rendering Battlefield 4 with Mantle by Johan Andersson - AMD at GDC14
 
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14
RapidFire - the Easy Route to low Latency Cloud Gaming Solutions - AMD at GDC14
 

Recently uploaded

The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 

Recently uploaded (20)

The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 

PT-4054, "OpenCL™ Accelerated Compute Libraries" by John Melonakos