SlideShare a Scribd company logo
Deep Learning Applications Design, Development and
Deployment in IoT Edge
Jayakumar. S PhD (IIT Bombay)
Object Automation, ​04/02/2020
Contents
Introduction 3
Power 9 based DL training 4
2.1 HW and SW Requirements 4
2.1.1 Talos™ II Entry-Level Developer System 4
2.1.2 Software 4
2.2 “DLtrain” resources in github 5
2.3 Build Tool Set 5
2.4 Build Process to Create “DLtrain” 5
2.5 Use DLtrain to train ANN model 6
2.6 Use DLtrain to Inferencing 6
Working with CUDA core 7
3.1 Hardware Used in CUDA computing 7
3.2 Driver Software Installation in Power AC922 7
3.3 Single Project using Power AC922 and GPU 8
3.3.1 Build Application 8
3.3.2 Sample Code in cu 9
Inference app in Android phone 13
4.1 Install NDK in Android Studio 13
4.2 Build Inference Engine 14
Update Inference Engine with Trained Model 15
Question: How to build toPhone/SndModel.jar ? 15
Question: How to use toPhone/SndModel.jar ? 15
Working with ML in Watson Studio 17
Visual Recognition in Watson Studio 17
Deploy Visual Recognition 17
Visual Recognition Client in Android Phone 19
1.Introduction
Intelligence IoT Edge is playing a critical role in services that require real time
inferencing. Historically, there have been systems with a high amount of
engineering complexity in terms of deployment and also in operation. For
example, SCADA is one such a system that has been working in the Power
Generation industry, Oil and Gas industry, Cement factories etc. In fact, SCADA
includes humans in loop and making it as Supervisory control and Data
acquisition. In the advent of Deep learning and its success in the modern Digital
side, there have been huge amounts of interest among researchers to carry
Deep learning Models to above mentioned industrial verticals and trying to bring
up Intelligent control and Data acquisition. In the place of Supervisor, it appears
that intelligent IoT edge coming up to perform those tasks that are handled by
Human beings in the form of Supervisor. Thus there is immense interest in
making IoT Edge as intelligent systems in these core engineering verticals apart
from consumer industry requirements. Lecture series designed to include NN
models, Train a NN model with training data ( mostly use MNIST ) and validate
trained NN models before going for deployment in IoT Edge. In the case of
deployment, there is a huge interest in making SmartPhones as IoT Edge such
that the same device can be used without much investment during the learning
time of each learner. However, in the case of industrial deployment are expected
to happen in devices like Jetson Nano, Ultra96-V2, mmWave Radar IWR 6843
etc. Maybe, as an advanced lecture series, there is a plan to cover the above
mentioned IoT Edges.
2.Power 9 based DL training
Along with Compute capability, it is important to have Ultra high speed I/O capability to
share training data with GPUs plus other Power AC922. Artificial Neural Network (ANN)
Model is created to perform classification of given image. The MNIST data set is used to
Train ANN models by using AC922.
2.1 HW and SW Requirements
2.1.1 ​Talos™ II Entry-Level Developer System
TLSDS3
Talos™ II Entry-Level Developer System including:
● EATX chassis with 500W ATX power supply
● A single Talos™ II Lite EATX mainboard
● One 4-core IBM POWER9 CPU
○ 4 cores per package
○ SMT4 capable
○ POWER IOMMU
○ Hardware virtualization extensions
○ 90W TDP
● One 3U POWER9 heatsink / fan (HSF) assembly
● 8GB DDR4 ECC registered memory
● 2 front panel USB 2.0 ports
● 128GB internal NVMe storage
● Recovery DV​D
https://www.raptorcs.com/content/TLSDS3/intro.html
2.1.2 ​Software
1. Ubuntu 18.04,
2. cmake version 3.10.2 ,
3. gcc -9 , g++9,
2.2 “DLtrain” resources in github
https://github.com/DLinIoTedge/DLtrain provides ready use applications to train ANN
based Deep Learning Model.
Deep Learning training application is coded in C and C++.
2.3 Build Tool Set
Ubuntu 18.04 based g++, gcc tool set used to ( cmake also used to create Make file) to
create DL application that is running in Power AC922.
2.4 Build Process to Create “DLtrain”
Making “DLtrain” application in Ubuntu 18.04 ( Power AC922 ) machine with
gcc-9 g++-9
​cd C++NNFast
rm -rf build
cd build
​ cmake -D CMAKE_C_COMPILER=gcc-9 -D CMAKE_CXX_COMPILER=g++-9 ..
make
2.5 Use DLtrain to train ANN model
Using, “DLtrain” application to train ANN models by using the MNIST data set.
​bin/DLtrain conf train ​// Train DL model
​bin/DLtrain conf train o ​// to overwrite tranined model
2.6 Use DLtrain to Inferencing
Using, “DLtrain” application to infer handwritten number numbers and stored in
28x28 pixel size images.
​ bin/DLtrain conf infer 3 ​// only all 3 from data set
bin/DLtrain conf infer ​<filename>​ ​ // only raw ...768 bytes... binary value
bin/DLtrain conf infer img.raw ​// sample file with number 4 in img.raw
3. Working with CUDA core
3.1 Hardware Used in CUDA computing
Hardware from Nvidia :​ GeForce RTX 2070
● GPU Architecture ​Turing
● NVIDIA CUDA​®​
Cores 2304
● RTX-OPS ​42T
● Boost Clock ​1620 MHz
● Frame Buffer ​8 GB GDDR6
● Memory Speed ​14 Gbps
3.2 Driver Software Installation in Power AC922
Get CUDA 10.2 for ubuntu 18.04 on ppc64le from the following link. Use wget to obtain
“cuda_10.2.89_440.33.01_linux_ppc64le.run”
http://developer.download.nvidia.com/compute/cuda/10.2/Prod/local_installers/cuda_10.
2.89_440.33.01_linux_ppc64le.run
Install driver by using following command.
sudo sh cuda_10.2.89_440.33.01_linux_ppc64le.run
After above command, use nvidia-smi to get details on installed driver.
3.3 Single Project using Power AC922 and GPU
3.3.1 Build Application
File with *.cu extension is used to build application that runs partly in AC922 and also
partly in CUDA core of GPU
For GPU
nvcc is NVIDIA (R) Cuda compiler driver and its version “Cuda compilation tools,
release 10.1, V10.1.243 “
For Host CPU
g++ (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
Use make ( gnu make 4.1 is used) tool to build applications.
jk@jkDL:~/NVIDIA_CUDA-10.1_Samples/0_Simple/vectorAdd$​ make
/usr/local/cuda/bin/nvcc -ccbin g++ -m64 -gencode arch=compute_30,code=sm_30
-gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37
-gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52
-gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61
-gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75
-gencode arch=compute_75,code=compute_75 -o vectorAdd vectorAdd.o
mkdir -p ../../bin/ppc64le/linux/release
cp vectorAdd ../../bin/ppc64le/linux/release
Above make process creating “​vectorAdd” and the same is executed as given below.
jk@jkDL:~/NVIDIA_CUDA-10.1_Samples/0_Simple/vectorAdd$​ ​./vectorAdd
[Vector addition of 50000 elements]
Copy input data from the host memory to the CUDA device
CUDA kernel launch with 196 blocks of 256 threads
Copy output data from the CUDA device to the host memory
Test PASSED
Done
3.3.2 Sample Code in cu
In the following sample source code is given.
#include <stdio.h>
// For the CUDA runtime routines (prefixed with "cuda_")
#include <cuda_runtime.h>
#include <helper_cuda.h>
/**
* CUDA Kernel Device code
*
* Computes the vector addition of A and B into C. The 3 vectors have the same
* number of elements numElements.
*/
__global__ void
vectorAdd(const float *A, const float *B, float *C, int numElements)
{
int i = blockDim.x * blockIdx.x + threadIdx.x;
if (i < numElements)
{
C[i] = A[i] + B[i];
}
}
/**
* Host main routine
*/
int
main(void)
{
// Error code to check return values for CUDA calls
cudaError_t err = cudaSuccess;
// Print the vector length to be used, and compute its size
int numElements = 50000;
size_t size = numElements * sizeof(float);
printf("[Vector addition of %d elements]n", numElements);
// Allocate the host input vector A
float *h_A = (float *)malloc(size);
// Allocate the host input vector B
float *h_B = (float *)malloc(size);
// Allocate the host output vector C
float *h_C = (float *)malloc(size);
// Verify that allocations succeeded
if (h_A == NULL || h_B == NULL || h_C == NULL)
{
fprintf(stderr, "Failed to allocate host vectors!n");
exit(EXIT_FAILURE);
}
// Initialize the host input vectors
for (int i = 0; i < numElements; ++i)
{
h_A[i] = rand()/(float)RAND_MAX;
h_B[i] = rand()/(float)RAND_MAX;
}
// Allocate the device input vector A
float *d_A = NULL;
err = cudaMalloc((void **)&d_A, size);
if (err != cudaSuccess)
{
fprintf(stderr, "Failed to allocate device vector A (error code %s)!n",
cudaGetErrorString(err));
exit(EXIT_FAILURE);
}
// Allocate the device input vector B
float *d_B = NULL;
err = cudaMalloc((void **)&d_B, size);
if (err != cudaSuccess)
{
fprintf(stderr, "Failed to allocate device vector B (error code %s)!n",
cudaGetErrorString(err));
exit(EXIT_FAILURE);
}
// Allocate the device output vector C
float *d_C = NULL;
err = cudaMalloc((void **)&d_C, size);
if (err != cudaSuccess)
{
fprintf(stderr, "Failed to allocate device vector C (error code %s)!n",
cudaGetErrorString(err));
exit(EXIT_FAILURE);
}
// Copy the host input vectors A and B in host memory to the device input vectors in
// device memory
printf("Copy input data from the host memory to the CUDA devicen");
err = cudaMemcpy(d_A, h_A, size, cudaMemcpyHostToDevice);
if (err != cudaSuccess)
{
fprintf(stderr, "Failed to copy vector A from host to device (error code %s)!n",
cudaGetErrorString(err));
exit(EXIT_FAILURE);
}
err = cudaMemcpy(d_B, h_B, size, cudaMemcpyHostToDevice);
if (err != cudaSuccess)
{
fprintf(stderr, "Failed to copy vector B from host to device (error code %s)!n",
cudaGetErrorString(err));
exit(EXIT_FAILURE);
}
// Launch the Vector Add CUDA Kernel
int threadsPerBlock = 256;
int blocksPerGrid =(numElements + threadsPerBlock - 1) / threadsPerBlock;
printf("CUDA kernel launch with %d blocks of %d threadsn", blocksPerGrid,
threadsPerBlock);
vectorAdd<<<blocksPerGrid, threadsPerBlock>>>(d_A, d_B, d_C, numElements);
err = cudaGetLastError();
if (err != cudaSuccess)
{
fprintf(stderr, "Failed to launch vectorAdd kernel (error code %s)!n",
cudaGetErrorString(err));
exit(EXIT_FAILURE);
}
// Copy the device result vector in device memory to the host result vector
// in host memory.
printf("Copy output data from the CUDA device to the host memoryn");
err = cudaMemcpy(h_C, d_C, size, cudaMemcpyDeviceToHost);
if (err != cudaSuccess)
{
fprintf(stderr, "Failed to copy vector C from device to host (error code %s)!n",
cudaGetErrorString(err));
exit(EXIT_FAILURE);
}
// Verify that the result vector is correct
for (int i = 0; i < numElements; ++i)
{
if (fabs(h_A[i] + h_B[i] - h_C[i]) > 1e-5)
{
fprintf(stderr, "Result verification failed at element %d!n", i);
exit(EXIT_FAILURE);
}
}
printf("Test PASSEDn");
// Free device global memory
err = cudaFree(d_A);
if (err != cudaSuccess)
{
fprintf(stderr, "Failed to free device vector A (error code %s)!n", cudaGetErrorString(err));
exit(EXIT_FAILURE);
}
err = cudaFree(d_B);
if (err != cudaSuccess)
{
fprintf(stderr, "Failed to free device vector B (error code %s)!n", cudaGetErrorString(err));
exit(EXIT_FAILURE);
}
err = cudaFree(d_C);
if (err != cudaSuccess)
{
fprintf(stderr, "Failed to free device vector C (error code %s)!n", cudaGetErrorString(err));
exit(EXIT_FAILURE);
}
// Free host memory
free(h_A);
free(h_B);
free(h_C);
printf("Donen");
return 0;
}
4.Inference app in Android phone
4.1 Install NDK in Android Studio
ANN model based inference engine is coded in C++ and C. To make a library by using
inference engine code, there is a need to have NDK also installed in Android studio. In
the following installation of Android studio and also installation of NDK are given in
detail. Following is installed in Ubuntu 18.04 x86 machine.
Installation of dependencies
Android Studio requires OpenJDK version 8 or above to be installed to your system
sudo apt update
sudo apt install openjdk-8-jdk
java -version
Install Android Studio
sudo snap install android-studio --classic
start Android Studio either by typing android-studio in your terminal or by clicking on
the Android Studio icon (Activities -> Android Studio).
SDK required version
Use​ SDK 22 or above.
What is used in present build of J722 version of App is ​SDK 29
Install NDK
Use SDK manager to install the following components ( NDK). these components useful to build
JNI part of Inference engine
Packages to install:
- ​LLDB 3.1 (lldb;3.1)
- CMake 3.10.2.4988404 (cmake;3.10.2.4988404)
- NDK (Side by side) 20.1.5948944 (ndk;20.1.5948944)
Reference:
https://linuxize.com/post/how-to-install-android-studio-on-ubuntu-18-04/
4.2 Build Inference Engine
Using Android studio build Inference engine as given in the following workflow and copy
the same APK into Android phone .
Windows 10 or Ubuntu 18.04 with Android Studio ( the latest stable version of Android
Studio is version 3.3.1.0.) is used. NDK (Side by side) 20.1.5948944
(ndk;20.1.5948944) is used to handle JNI side lib creation for Inference Engine,
Inference engine full source code is given in “ ​https://github.com/DLinIoTedge/NN​ “
Following diagram provides information on workflow to create J722 application in the
form of APK that can run in Android phone.
5.Update Inference Engine with Trained Model
Model update application source code is given in this URL
https://github.com/DLinIoTedge/Send2Phone
Following IP network configuration is recommended
Send2Phone source code in JAVA and it is running in Power machine or in X86
machine.
Question: How to build toPhone/SndModel.jar ?
javac main.java
Question: How to use toPhone/SndModel.jar ?
java -jar toPhone/SndModel.jar
And also follow the above given workflow in Host CPU and also in Android Phone. Successfully
operations in the above result in “deployment of trained model” in Inference engine that runs in
Android smartphone.
6. Working with ML in Watson Studio
Details are given in the following link
ai-ml-watson.pdf
7. Visual Recognition in Watson Studio
Details are given in the following link
A beginner's guide to setting up a visual recognition service
8. Deploy Visual Recognition
// following worked
curl -X POST -u "apikey:​put your API key here​" -F "images_file=@sasi.jpeg" -F "threshold=0.6" -F
"classifier_ids=DefaultCustomModel_1738357304"
"https://gateway.watsonplatform.net/visual-recognition/api/v3/classify?version=2018-03-19"
{
"images": [
{
"classifiers": [
{
"classifier_id": "DefaultCustomModel_1738357304",
"name": "Default Custom Model",
"classes": [
{
"class": "lb1compress.zip",
"score": 0.855
}
]
}
],
"image": "sasi.jpeg"
}
],
"images_processed": 1,
"custom_classes": 3
}
Sample code for Client Application
…………………………. Pyrhan code ……………
pip
pip install --upgrade "watson-developer-cloud>=2.1.1"
Authentication
from watson_developer_cloud import VisualRecognitionV3
visual_recognition = VisualRecognitionV3(
version='{version}',
iam_apikey='{iam_api_key}'
)
Authentication (for instances created before May 23, 2018)
from watson_developer_cloud import VisualRecognitionV3
visual_recognition = VisualRecognitionV3(
version='{version}',
api_key='{api_key}'
)
Classify an image
import json
from watson_developer_cloud import VisualRecognitionV3
visual_recognition = VisualRecognitionV3(
'2018-03-19',
iam_apikey='{iam_api_key}')
with open('./fruitbowl.jpg', 'rb') as images_file:
classes = visual_recognition.classify(
images_file,
threshold='0.6',
classifier_ids='DefaultCustomModel_967878440').get_result()
print(json.dumps(classes, indent=2))
…………………………………………………………
9. Visual Recognition Client in Android Phone
Details are given in the following link
VR
///////////////////////////////////////////// ​document ends​ //////////////////////////////////////////////////////

More Related Content

What's hot

A Deep Dive into QtCanBus
A Deep Dive into QtCanBusA Deep Dive into QtCanBus
A Deep Dive into QtCanBus
Burkhard Stubert
 
[01][gpu 컴퓨팅을 위한 언어, 도구 및 api] miller languages tools
[01][gpu 컴퓨팅을 위한 언어, 도구 및 api] miller languages tools[01][gpu 컴퓨팅을 위한 언어, 도구 및 api] miller languages tools
[01][gpu 컴퓨팅을 위한 언어, 도구 및 api] miller languages toolslaparuma
 
Introduction to OpenCL
Introduction to OpenCLIntroduction to OpenCL
Introduction to OpenCL
Unai Lopez-Novoa
 
Intro to GPGPU with CUDA (DevLink)
Intro to GPGPU with CUDA (DevLink)Intro to GPGPU with CUDA (DevLink)
Intro to GPGPU with CUDA (DevLink)
Rob Gillen
 
PT-4057, Automated CUDA-to-OpenCL™ Translation with CU2CL: What's Next?, by W...
PT-4057, Automated CUDA-to-OpenCL™ Translation with CU2CL: What's Next?, by W...PT-4057, Automated CUDA-to-OpenCL™ Translation with CU2CL: What's Next?, by W...
PT-4057, Automated CUDA-to-OpenCL™ Translation with CU2CL: What's Next?, by W...
AMD Developer Central
 
第38回 名古屋CV・PRML勉強会 「Kinect v2本の紹介とPCLの概要」
第38回 名古屋CV・PRML勉強会 「Kinect v2本の紹介とPCLの概要」第38回 名古屋CV・PRML勉強会 「Kinect v2本の紹介とPCLの概要」
第38回 名古屋CV・PRML勉強会 「Kinect v2本の紹介とPCLの概要」
Tsukasa Sugiura
 
Developing new zynq based instruments
Developing new zynq based instrumentsDeveloping new zynq based instruments
Developing new zynq based instruments
Graham NAYLOR
 
Cuda introduction
Cuda introductionCuda introduction
Cuda introductionHanibei
 
Cuda
CudaCuda
OpenCL Programming 101
OpenCL Programming 101OpenCL Programming 101
OpenCL Programming 101
Yoss Cohen
 
Nvidia cuda tutorial_no_nda_apr08
Nvidia cuda tutorial_no_nda_apr08Nvidia cuda tutorial_no_nda_apr08
Nvidia cuda tutorial_no_nda_apr08Angela Mendoza M.
 
PL-4048, Adapting languages for parallel processing on GPUs, by Neil Henning
PL-4048, Adapting languages for parallel processing on GPUs, by Neil HenningPL-4048, Adapting languages for parallel processing on GPUs, by Neil Henning
PL-4048, Adapting languages for parallel processing on GPUs, by Neil Henning
AMD Developer Central
 
Introduction to CUDA C: NVIDIA : Notes
Introduction to CUDA C: NVIDIA : NotesIntroduction to CUDA C: NVIDIA : Notes
Introduction to CUDA C: NVIDIA : Notes
Subhajit Sahu
 
xapp744-HIL-Zynq-7000
xapp744-HIL-Zynq-7000xapp744-HIL-Zynq-7000
xapp744-HIL-Zynq-7000Umang Parekh
 
Introduction to parallel computing using CUDA
Introduction to parallel computing using CUDAIntroduction to parallel computing using CUDA
Introduction to parallel computing using CUDA
Martin Peniak
 
Linux Kernel , BSP, Boot Loader, ARM Engineer - Satish profile
Linux Kernel , BSP, Boot Loader, ARM Engineer - Satish profileLinux Kernel , BSP, Boot Loader, ARM Engineer - Satish profile
Linux Kernel , BSP, Boot Loader, ARM Engineer - Satish profile
Satish Kumar
 
Introduction to CUDA
Introduction to CUDAIntroduction to CUDA
Introduction to CUDA
Raymond Tay
 
Kato Mivule: An Overview of CUDA for High Performance Computing
Kato Mivule: An Overview of CUDA for High Performance ComputingKato Mivule: An Overview of CUDA for High Performance Computing
Kato Mivule: An Overview of CUDA for High Performance Computing
Kato Mivule
 

What's hot (20)

A Deep Dive into QtCanBus
A Deep Dive into QtCanBusA Deep Dive into QtCanBus
A Deep Dive into QtCanBus
 
Cuda 2011
Cuda 2011Cuda 2011
Cuda 2011
 
[01][gpu 컴퓨팅을 위한 언어, 도구 및 api] miller languages tools
[01][gpu 컴퓨팅을 위한 언어, 도구 및 api] miller languages tools[01][gpu 컴퓨팅을 위한 언어, 도구 및 api] miller languages tools
[01][gpu 컴퓨팅을 위한 언어, 도구 및 api] miller languages tools
 
Introduction to OpenCL
Introduction to OpenCLIntroduction to OpenCL
Introduction to OpenCL
 
Intro to GPGPU with CUDA (DevLink)
Intro to GPGPU with CUDA (DevLink)Intro to GPGPU with CUDA (DevLink)
Intro to GPGPU with CUDA (DevLink)
 
PT-4057, Automated CUDA-to-OpenCL™ Translation with CU2CL: What's Next?, by W...
PT-4057, Automated CUDA-to-OpenCL™ Translation with CU2CL: What's Next?, by W...PT-4057, Automated CUDA-to-OpenCL™ Translation with CU2CL: What's Next?, by W...
PT-4057, Automated CUDA-to-OpenCL™ Translation with CU2CL: What's Next?, by W...
 
第38回 名古屋CV・PRML勉強会 「Kinect v2本の紹介とPCLの概要」
第38回 名古屋CV・PRML勉強会 「Kinect v2本の紹介とPCLの概要」第38回 名古屋CV・PRML勉強会 「Kinect v2本の紹介とPCLの概要」
第38回 名古屋CV・PRML勉強会 「Kinect v2本の紹介とPCLの概要」
 
Developing new zynq based instruments
Developing new zynq based instrumentsDeveloping new zynq based instruments
Developing new zynq based instruments
 
Cuda introduction
Cuda introductionCuda introduction
Cuda introduction
 
Cuda
CudaCuda
Cuda
 
OpenCL Programming 101
OpenCL Programming 101OpenCL Programming 101
OpenCL Programming 101
 
Nvidia cuda tutorial_no_nda_apr08
Nvidia cuda tutorial_no_nda_apr08Nvidia cuda tutorial_no_nda_apr08
Nvidia cuda tutorial_no_nda_apr08
 
PL-4048, Adapting languages for parallel processing on GPUs, by Neil Henning
PL-4048, Adapting languages for parallel processing on GPUs, by Neil HenningPL-4048, Adapting languages for parallel processing on GPUs, by Neil Henning
PL-4048, Adapting languages for parallel processing on GPUs, by Neil Henning
 
Introduction to CUDA C: NVIDIA : Notes
Introduction to CUDA C: NVIDIA : NotesIntroduction to CUDA C: NVIDIA : Notes
Introduction to CUDA C: NVIDIA : Notes
 
xapp744-HIL-Zynq-7000
xapp744-HIL-Zynq-7000xapp744-HIL-Zynq-7000
xapp744-HIL-Zynq-7000
 
Introduction to parallel computing using CUDA
Introduction to parallel computing using CUDAIntroduction to parallel computing using CUDA
Introduction to parallel computing using CUDA
 
Cuda Architecture
Cuda ArchitectureCuda Architecture
Cuda Architecture
 
Linux Kernel , BSP, Boot Loader, ARM Engineer - Satish profile
Linux Kernel , BSP, Boot Loader, ARM Engineer - Satish profileLinux Kernel , BSP, Boot Loader, ARM Engineer - Satish profile
Linux Kernel , BSP, Boot Loader, ARM Engineer - Satish profile
 
Introduction to CUDA
Introduction to CUDAIntroduction to CUDA
Introduction to CUDA
 
Kato Mivule: An Overview of CUDA for High Performance Computing
Kato Mivule: An Overview of CUDA for High Performance ComputingKato Mivule: An Overview of CUDA for High Performance Computing
Kato Mivule: An Overview of CUDA for High Performance Computing
 

Similar to Deep Learning Edge

introduction to CUDA_C.pptx it is widely used
introduction to CUDA_C.pptx it is widely usedintroduction to CUDA_C.pptx it is widely used
introduction to CUDA_C.pptx it is widely used
Himanshu577858
 
Intro to GPGPU Programming with Cuda
Intro to GPGPU Programming with CudaIntro to GPGPU Programming with Cuda
Intro to GPGPU Programming with CudaRob Gillen
 
Go native benchmark test su dispositivi x86: java, ndk, ipp e tbb
Go native  benchmark test su dispositivi x86: java, ndk, ipp e tbbGo native  benchmark test su dispositivi x86: java, ndk, ipp e tbb
Go native benchmark test su dispositivi x86: java, ndk, ipp e tbb
JooinK
 
Build and run embedded apps faster from qt creator with docker
Build and run embedded apps faster from qt creator with dockerBuild and run embedded apps faster from qt creator with docker
Build and run embedded apps faster from qt creator with docker
Qt
 
Build 2016 - B880 - Top 6 Reasons to Move Your C++ Code to Visual Studio 2015
Build 2016 - B880 - Top 6 Reasons to Move Your C++ Code to Visual Studio 2015Build 2016 - B880 - Top 6 Reasons to Move Your C++ Code to Visual Studio 2015
Build 2016 - B880 - Top 6 Reasons to Move Your C++ Code to Visual Studio 2015
Windows Developer
 
Lab Handson: Power your Creations with Intel Edison!
Lab Handson: Power your Creations with Intel Edison!Lab Handson: Power your Creations with Intel Edison!
Lab Handson: Power your Creations with Intel Edison!
Codemotion
 
Programming IoT with Docker: How to Start?
Programming IoT with Docker: How to Start?Programming IoT with Docker: How to Start?
Programming IoT with Docker: How to Start?
msyukor
 
VLSI Training presentation
VLSI Training presentationVLSI Training presentation
VLSI Training presentationDaola Khungur
 
What should you know about Net Core?
What should you know about Net Core?What should you know about Net Core?
What should you know about Net Core?
Damir Dobric
 
Speeding up Programs with OpenACC in GCC
Speeding up Programs with OpenACC in GCCSpeeding up Programs with OpenACC in GCC
Speeding up Programs with OpenACC in GCC
inside-BigData.com
 
Ese 2008 RTSC Draft1
Ese 2008 RTSC Draft1Ese 2008 RTSC Draft1
Ese 2008 RTSC Draft1
drusso
 
generate IP CORES
generate IP CORESgenerate IP CORES
generate IP CORES
guest296013
 
Android on IA devices and Intel Tools
Android on IA devices and Intel ToolsAndroid on IA devices and Intel Tools
Android on IA devices and Intel ToolsXavier Hallade
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
Scaling Docker Containers using Kubernetes and Azure Container Service
Scaling Docker Containers using Kubernetes and Azure Container ServiceScaling Docker Containers using Kubernetes and Azure Container Service
Scaling Docker Containers using Kubernetes and Azure Container Service
Ben Hall
 
Use Eclipse technologies to build a modern embedded IDE
Use Eclipse technologies to build a modern embedded IDEUse Eclipse technologies to build a modern embedded IDE
Use Eclipse technologies to build a modern embedded IDEBenjamin Cabé
 
Native Application Development With Qt
Native Application Development With QtNative Application Development With Qt
Native Application Development With Qtrahulnimbalkar
 

Similar to Deep Learning Edge (20)

introduction to CUDA_C.pptx it is widely used
introduction to CUDA_C.pptx it is widely usedintroduction to CUDA_C.pptx it is widely used
introduction to CUDA_C.pptx it is widely used
 
Intro to GPGPU Programming with Cuda
Intro to GPGPU Programming with CudaIntro to GPGPU Programming with Cuda
Intro to GPGPU Programming with Cuda
 
Go native benchmark test su dispositivi x86: java, ndk, ipp e tbb
Go native  benchmark test su dispositivi x86: java, ndk, ipp e tbbGo native  benchmark test su dispositivi x86: java, ndk, ipp e tbb
Go native benchmark test su dispositivi x86: java, ndk, ipp e tbb
 
Build and run embedded apps faster from qt creator with docker
Build and run embedded apps faster from qt creator with dockerBuild and run embedded apps faster from qt creator with docker
Build and run embedded apps faster from qt creator with docker
 
Build 2016 - B880 - Top 6 Reasons to Move Your C++ Code to Visual Studio 2015
Build 2016 - B880 - Top 6 Reasons to Move Your C++ Code to Visual Studio 2015Build 2016 - B880 - Top 6 Reasons to Move Your C++ Code to Visual Studio 2015
Build 2016 - B880 - Top 6 Reasons to Move Your C++ Code to Visual Studio 2015
 
Lab Handson: Power your Creations with Intel Edison!
Lab Handson: Power your Creations with Intel Edison!Lab Handson: Power your Creations with Intel Edison!
Lab Handson: Power your Creations with Intel Edison!
 
Programming IoT with Docker: How to Start?
Programming IoT with Docker: How to Start?Programming IoT with Docker: How to Start?
Programming IoT with Docker: How to Start?
 
VLSI Training presentation
VLSI Training presentationVLSI Training presentation
VLSI Training presentation
 
What should you know about Net Core?
What should you know about Net Core?What should you know about Net Core?
What should you know about Net Core?
 
Speeding up Programs with OpenACC in GCC
Speeding up Programs with OpenACC in GCCSpeeding up Programs with OpenACC in GCC
Speeding up Programs with OpenACC in GCC
 
Ese 2008 RTSC Draft1
Ese 2008 RTSC Draft1Ese 2008 RTSC Draft1
Ese 2008 RTSC Draft1
 
generate IP CORES
generate IP CORESgenerate IP CORES
generate IP CORES
 
Android on IA devices and Intel Tools
Android on IA devices and Intel ToolsAndroid on IA devices and Intel Tools
Android on IA devices and Intel Tools
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Scaling Docker Containers using Kubernetes and Azure Container Service
Scaling Docker Containers using Kubernetes and Azure Container ServiceScaling Docker Containers using Kubernetes and Azure Container Service
Scaling Docker Containers using Kubernetes and Azure Container Service
 
Use Eclipse technologies to build a modern embedded IDE
Use Eclipse technologies to build a modern embedded IDEUse Eclipse technologies to build a modern embedded IDE
Use Eclipse technologies to build a modern embedded IDE
 
Native Application Development With Qt
Native Application Development With QtNative Application Development With Qt
Native Application Development With Qt
 

More from Ganesan Narayanasamy

Chip Design Curriculum development Residency program
Chip Design Curriculum development Residency programChip Design Curriculum development Residency program
Chip Design Curriculum development Residency program
Ganesan Narayanasamy
 
Basics of Digital Design and Verilog
Basics of Digital Design and VerilogBasics of Digital Design and Verilog
Basics of Digital Design and Verilog
Ganesan Narayanasamy
 
180 nm Tape out experience using Open POWER ISA
180 nm Tape out experience using Open POWER ISA180 nm Tape out experience using Open POWER ISA
180 nm Tape out experience using Open POWER ISA
Ganesan Narayanasamy
 
Workload Transformation and Innovations in POWER Architecture
Workload Transformation and Innovations in POWER Architecture Workload Transformation and Innovations in POWER Architecture
Workload Transformation and Innovations in POWER Architecture
Ganesan Narayanasamy
 
OpenPOWER Workshop at IIT Roorkee
OpenPOWER Workshop at IIT RoorkeeOpenPOWER Workshop at IIT Roorkee
OpenPOWER Workshop at IIT Roorkee
Ganesan Narayanasamy
 
Deep Learning Use Cases using OpenPOWER systems
Deep Learning Use Cases using OpenPOWER systemsDeep Learning Use Cases using OpenPOWER systems
Deep Learning Use Cases using OpenPOWER systems
Ganesan Narayanasamy
 
IBM BOA for POWER
IBM BOA for POWER IBM BOA for POWER
IBM BOA for POWER
Ganesan Narayanasamy
 
OpenPOWER System Marconi100
OpenPOWER System Marconi100OpenPOWER System Marconi100
OpenPOWER System Marconi100
Ganesan Narayanasamy
 
OpenPOWER Latest Updates
OpenPOWER Latest UpdatesOpenPOWER Latest Updates
OpenPOWER Latest Updates
Ganesan Narayanasamy
 
POWER10 innovations for HPC
POWER10 innovations for HPCPOWER10 innovations for HPC
POWER10 innovations for HPC
Ganesan Narayanasamy
 
Deeplearningusingcloudpakfordata
DeeplearningusingcloudpakfordataDeeplearningusingcloudpakfordata
Deeplearningusingcloudpakfordata
Ganesan Narayanasamy
 
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
Ganesan Narayanasamy
 
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systems
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systemsAI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systems
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systems
Ganesan Narayanasamy
 
AI in healthcare - Use Cases
AI in healthcare - Use Cases AI in healthcare - Use Cases
AI in healthcare - Use Cases
Ganesan Narayanasamy
 
AI in Health Care using IBM Systems/OpenPOWER systems
AI in Health Care using IBM Systems/OpenPOWER systemsAI in Health Care using IBM Systems/OpenPOWER systems
AI in Health Care using IBM Systems/OpenPOWER systems
Ganesan Narayanasamy
 
AI in Healh Care using IBM POWER systems
AI in Healh Care using IBM POWER systems AI in Healh Care using IBM POWER systems
AI in Healh Care using IBM POWER systems
Ganesan Narayanasamy
 
Poster from NUS
Poster from NUSPoster from NUS
Poster from NUS
Ganesan Narayanasamy
 
SAP HANA on POWER9 systems
SAP HANA on POWER9 systemsSAP HANA on POWER9 systems
SAP HANA on POWER9 systems
Ganesan Narayanasamy
 
Graphical Structure Learning accelerated with POWER9
Graphical Structure Learning accelerated with POWER9Graphical Structure Learning accelerated with POWER9
Graphical Structure Learning accelerated with POWER9
Ganesan Narayanasamy
 
AI in the enterprise
AI in the enterprise AI in the enterprise
AI in the enterprise
Ganesan Narayanasamy
 

More from Ganesan Narayanasamy (20)

Chip Design Curriculum development Residency program
Chip Design Curriculum development Residency programChip Design Curriculum development Residency program
Chip Design Curriculum development Residency program
 
Basics of Digital Design and Verilog
Basics of Digital Design and VerilogBasics of Digital Design and Verilog
Basics of Digital Design and Verilog
 
180 nm Tape out experience using Open POWER ISA
180 nm Tape out experience using Open POWER ISA180 nm Tape out experience using Open POWER ISA
180 nm Tape out experience using Open POWER ISA
 
Workload Transformation and Innovations in POWER Architecture
Workload Transformation and Innovations in POWER Architecture Workload Transformation and Innovations in POWER Architecture
Workload Transformation and Innovations in POWER Architecture
 
OpenPOWER Workshop at IIT Roorkee
OpenPOWER Workshop at IIT RoorkeeOpenPOWER Workshop at IIT Roorkee
OpenPOWER Workshop at IIT Roorkee
 
Deep Learning Use Cases using OpenPOWER systems
Deep Learning Use Cases using OpenPOWER systemsDeep Learning Use Cases using OpenPOWER systems
Deep Learning Use Cases using OpenPOWER systems
 
IBM BOA for POWER
IBM BOA for POWER IBM BOA for POWER
IBM BOA for POWER
 
OpenPOWER System Marconi100
OpenPOWER System Marconi100OpenPOWER System Marconi100
OpenPOWER System Marconi100
 
OpenPOWER Latest Updates
OpenPOWER Latest UpdatesOpenPOWER Latest Updates
OpenPOWER Latest Updates
 
POWER10 innovations for HPC
POWER10 innovations for HPCPOWER10 innovations for HPC
POWER10 innovations for HPC
 
Deeplearningusingcloudpakfordata
DeeplearningusingcloudpakfordataDeeplearningusingcloudpakfordata
Deeplearningusingcloudpakfordata
 
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
 
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systems
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systemsAI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systems
AI in healthcare and Automobile Industry using OpenPOWER/IBM POWER9 systems
 
AI in healthcare - Use Cases
AI in healthcare - Use Cases AI in healthcare - Use Cases
AI in healthcare - Use Cases
 
AI in Health Care using IBM Systems/OpenPOWER systems
AI in Health Care using IBM Systems/OpenPOWER systemsAI in Health Care using IBM Systems/OpenPOWER systems
AI in Health Care using IBM Systems/OpenPOWER systems
 
AI in Healh Care using IBM POWER systems
AI in Healh Care using IBM POWER systems AI in Healh Care using IBM POWER systems
AI in Healh Care using IBM POWER systems
 
Poster from NUS
Poster from NUSPoster from NUS
Poster from NUS
 
SAP HANA on POWER9 systems
SAP HANA on POWER9 systemsSAP HANA on POWER9 systems
SAP HANA on POWER9 systems
 
Graphical Structure Learning accelerated with POWER9
Graphical Structure Learning accelerated with POWER9Graphical Structure Learning accelerated with POWER9
Graphical Structure Learning accelerated with POWER9
 
AI in the enterprise
AI in the enterprise AI in the enterprise
AI in the enterprise
 

Recently uploaded

DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
ThomasParaiso2
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
Neo4j
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 

Recently uploaded (20)

DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024GraphSummit Singapore | The Art of the  Possible with Graph - Q2 2024
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 

Deep Learning Edge

  • 1. Deep Learning Applications Design, Development and Deployment in IoT Edge Jayakumar. S PhD (IIT Bombay) Object Automation, ​04/02/2020
  • 2. Contents Introduction 3 Power 9 based DL training 4 2.1 HW and SW Requirements 4 2.1.1 Talos™ II Entry-Level Developer System 4 2.1.2 Software 4 2.2 “DLtrain” resources in github 5 2.3 Build Tool Set 5 2.4 Build Process to Create “DLtrain” 5 2.5 Use DLtrain to train ANN model 6 2.6 Use DLtrain to Inferencing 6 Working with CUDA core 7 3.1 Hardware Used in CUDA computing 7 3.2 Driver Software Installation in Power AC922 7 3.3 Single Project using Power AC922 and GPU 8 3.3.1 Build Application 8 3.3.2 Sample Code in cu 9 Inference app in Android phone 13 4.1 Install NDK in Android Studio 13 4.2 Build Inference Engine 14 Update Inference Engine with Trained Model 15 Question: How to build toPhone/SndModel.jar ? 15 Question: How to use toPhone/SndModel.jar ? 15 Working with ML in Watson Studio 17 Visual Recognition in Watson Studio 17 Deploy Visual Recognition 17 Visual Recognition Client in Android Phone 19
  • 3. 1.Introduction Intelligence IoT Edge is playing a critical role in services that require real time inferencing. Historically, there have been systems with a high amount of engineering complexity in terms of deployment and also in operation. For example, SCADA is one such a system that has been working in the Power Generation industry, Oil and Gas industry, Cement factories etc. In fact, SCADA includes humans in loop and making it as Supervisory control and Data acquisition. In the advent of Deep learning and its success in the modern Digital side, there have been huge amounts of interest among researchers to carry Deep learning Models to above mentioned industrial verticals and trying to bring up Intelligent control and Data acquisition. In the place of Supervisor, it appears that intelligent IoT edge coming up to perform those tasks that are handled by Human beings in the form of Supervisor. Thus there is immense interest in making IoT Edge as intelligent systems in these core engineering verticals apart from consumer industry requirements. Lecture series designed to include NN models, Train a NN model with training data ( mostly use MNIST ) and validate trained NN models before going for deployment in IoT Edge. In the case of deployment, there is a huge interest in making SmartPhones as IoT Edge such that the same device can be used without much investment during the learning time of each learner. However, in the case of industrial deployment are expected to happen in devices like Jetson Nano, Ultra96-V2, mmWave Radar IWR 6843 etc. Maybe, as an advanced lecture series, there is a plan to cover the above mentioned IoT Edges.
  • 4. 2.Power 9 based DL training Along with Compute capability, it is important to have Ultra high speed I/O capability to share training data with GPUs plus other Power AC922. Artificial Neural Network (ANN) Model is created to perform classification of given image. The MNIST data set is used to Train ANN models by using AC922. 2.1 HW and SW Requirements 2.1.1 ​Talos™ II Entry-Level Developer System TLSDS3 Talos™ II Entry-Level Developer System including: ● EATX chassis with 500W ATX power supply ● A single Talos™ II Lite EATX mainboard ● One 4-core IBM POWER9 CPU ○ 4 cores per package ○ SMT4 capable ○ POWER IOMMU ○ Hardware virtualization extensions ○ 90W TDP ● One 3U POWER9 heatsink / fan (HSF) assembly ● 8GB DDR4 ECC registered memory ● 2 front panel USB 2.0 ports ● 128GB internal NVMe storage ● Recovery DV​D https://www.raptorcs.com/content/TLSDS3/intro.html 2.1.2 ​Software 1. Ubuntu 18.04, 2. cmake version 3.10.2 , 3. gcc -9 , g++9,
  • 5. 2.2 “DLtrain” resources in github https://github.com/DLinIoTedge/DLtrain provides ready use applications to train ANN based Deep Learning Model. Deep Learning training application is coded in C and C++. 2.3 Build Tool Set Ubuntu 18.04 based g++, gcc tool set used to ( cmake also used to create Make file) to create DL application that is running in Power AC922. 2.4 Build Process to Create “DLtrain” Making “DLtrain” application in Ubuntu 18.04 ( Power AC922 ) machine with gcc-9 g++-9 ​cd C++NNFast rm -rf build cd build ​ cmake -D CMAKE_C_COMPILER=gcc-9 -D CMAKE_CXX_COMPILER=g++-9 .. make
  • 6. 2.5 Use DLtrain to train ANN model Using, “DLtrain” application to train ANN models by using the MNIST data set. ​bin/DLtrain conf train ​// Train DL model ​bin/DLtrain conf train o ​// to overwrite tranined model 2.6 Use DLtrain to Inferencing Using, “DLtrain” application to infer handwritten number numbers and stored in 28x28 pixel size images. ​ bin/DLtrain conf infer 3 ​// only all 3 from data set bin/DLtrain conf infer ​<filename>​ ​ // only raw ...768 bytes... binary value bin/DLtrain conf infer img.raw ​// sample file with number 4 in img.raw
  • 7. 3. Working with CUDA core 3.1 Hardware Used in CUDA computing Hardware from Nvidia :​ GeForce RTX 2070 ● GPU Architecture ​Turing ● NVIDIA CUDA​®​ Cores 2304 ● RTX-OPS ​42T ● Boost Clock ​1620 MHz ● Frame Buffer ​8 GB GDDR6 ● Memory Speed ​14 Gbps 3.2 Driver Software Installation in Power AC922 Get CUDA 10.2 for ubuntu 18.04 on ppc64le from the following link. Use wget to obtain “cuda_10.2.89_440.33.01_linux_ppc64le.run” http://developer.download.nvidia.com/compute/cuda/10.2/Prod/local_installers/cuda_10. 2.89_440.33.01_linux_ppc64le.run Install driver by using following command. sudo sh cuda_10.2.89_440.33.01_linux_ppc64le.run After above command, use nvidia-smi to get details on installed driver.
  • 8. 3.3 Single Project using Power AC922 and GPU 3.3.1 Build Application File with *.cu extension is used to build application that runs partly in AC922 and also partly in CUDA core of GPU For GPU nvcc is NVIDIA (R) Cuda compiler driver and its version “Cuda compilation tools, release 10.1, V10.1.243 “ For Host CPU g++ (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0 Use make ( gnu make 4.1 is used) tool to build applications. jk@jkDL:~/NVIDIA_CUDA-10.1_Samples/0_Simple/vectorAdd$​ make /usr/local/cuda/bin/nvcc -ccbin g++ -m64 -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -gencode arch=compute_37,code=sm_37 -gencode arch=compute_50,code=sm_50 -gencode arch=compute_52,code=sm_52 -gencode arch=compute_60,code=sm_60 -gencode arch=compute_61,code=sm_61 -gencode arch=compute_70,code=sm_70 -gencode arch=compute_75,code=sm_75 -gencode arch=compute_75,code=compute_75 -o vectorAdd vectorAdd.o mkdir -p ../../bin/ppc64le/linux/release cp vectorAdd ../../bin/ppc64le/linux/release Above make process creating “​vectorAdd” and the same is executed as given below. jk@jkDL:~/NVIDIA_CUDA-10.1_Samples/0_Simple/vectorAdd$​ ​./vectorAdd [Vector addition of 50000 elements] Copy input data from the host memory to the CUDA device CUDA kernel launch with 196 blocks of 256 threads Copy output data from the CUDA device to the host memory Test PASSED Done
  • 9. 3.3.2 Sample Code in cu In the following sample source code is given. #include <stdio.h> // For the CUDA runtime routines (prefixed with "cuda_") #include <cuda_runtime.h> #include <helper_cuda.h> /** * CUDA Kernel Device code * * Computes the vector addition of A and B into C. The 3 vectors have the same * number of elements numElements. */ __global__ void vectorAdd(const float *A, const float *B, float *C, int numElements) { int i = blockDim.x * blockIdx.x + threadIdx.x; if (i < numElements) { C[i] = A[i] + B[i]; } } /** * Host main routine */ int main(void) { // Error code to check return values for CUDA calls cudaError_t err = cudaSuccess; // Print the vector length to be used, and compute its size int numElements = 50000; size_t size = numElements * sizeof(float); printf("[Vector addition of %d elements]n", numElements); // Allocate the host input vector A float *h_A = (float *)malloc(size); // Allocate the host input vector B float *h_B = (float *)malloc(size);
  • 10. // Allocate the host output vector C float *h_C = (float *)malloc(size); // Verify that allocations succeeded if (h_A == NULL || h_B == NULL || h_C == NULL) { fprintf(stderr, "Failed to allocate host vectors!n"); exit(EXIT_FAILURE); } // Initialize the host input vectors for (int i = 0; i < numElements; ++i) { h_A[i] = rand()/(float)RAND_MAX; h_B[i] = rand()/(float)RAND_MAX; } // Allocate the device input vector A float *d_A = NULL; err = cudaMalloc((void **)&d_A, size); if (err != cudaSuccess) { fprintf(stderr, "Failed to allocate device vector A (error code %s)!n", cudaGetErrorString(err)); exit(EXIT_FAILURE); } // Allocate the device input vector B float *d_B = NULL; err = cudaMalloc((void **)&d_B, size); if (err != cudaSuccess) { fprintf(stderr, "Failed to allocate device vector B (error code %s)!n", cudaGetErrorString(err)); exit(EXIT_FAILURE); } // Allocate the device output vector C float *d_C = NULL; err = cudaMalloc((void **)&d_C, size); if (err != cudaSuccess) { fprintf(stderr, "Failed to allocate device vector C (error code %s)!n", cudaGetErrorString(err)); exit(EXIT_FAILURE);
  • 11. } // Copy the host input vectors A and B in host memory to the device input vectors in // device memory printf("Copy input data from the host memory to the CUDA devicen"); err = cudaMemcpy(d_A, h_A, size, cudaMemcpyHostToDevice); if (err != cudaSuccess) { fprintf(stderr, "Failed to copy vector A from host to device (error code %s)!n", cudaGetErrorString(err)); exit(EXIT_FAILURE); } err = cudaMemcpy(d_B, h_B, size, cudaMemcpyHostToDevice); if (err != cudaSuccess) { fprintf(stderr, "Failed to copy vector B from host to device (error code %s)!n", cudaGetErrorString(err)); exit(EXIT_FAILURE); } // Launch the Vector Add CUDA Kernel int threadsPerBlock = 256; int blocksPerGrid =(numElements + threadsPerBlock - 1) / threadsPerBlock; printf("CUDA kernel launch with %d blocks of %d threadsn", blocksPerGrid, threadsPerBlock); vectorAdd<<<blocksPerGrid, threadsPerBlock>>>(d_A, d_B, d_C, numElements); err = cudaGetLastError(); if (err != cudaSuccess) { fprintf(stderr, "Failed to launch vectorAdd kernel (error code %s)!n", cudaGetErrorString(err)); exit(EXIT_FAILURE); } // Copy the device result vector in device memory to the host result vector // in host memory. printf("Copy output data from the CUDA device to the host memoryn"); err = cudaMemcpy(h_C, d_C, size, cudaMemcpyDeviceToHost); if (err != cudaSuccess) { fprintf(stderr, "Failed to copy vector C from device to host (error code %s)!n", cudaGetErrorString(err)); exit(EXIT_FAILURE);
  • 12. } // Verify that the result vector is correct for (int i = 0; i < numElements; ++i) { if (fabs(h_A[i] + h_B[i] - h_C[i]) > 1e-5) { fprintf(stderr, "Result verification failed at element %d!n", i); exit(EXIT_FAILURE); } } printf("Test PASSEDn"); // Free device global memory err = cudaFree(d_A); if (err != cudaSuccess) { fprintf(stderr, "Failed to free device vector A (error code %s)!n", cudaGetErrorString(err)); exit(EXIT_FAILURE); } err = cudaFree(d_B); if (err != cudaSuccess) { fprintf(stderr, "Failed to free device vector B (error code %s)!n", cudaGetErrorString(err)); exit(EXIT_FAILURE); } err = cudaFree(d_C); if (err != cudaSuccess) { fprintf(stderr, "Failed to free device vector C (error code %s)!n", cudaGetErrorString(err)); exit(EXIT_FAILURE); } // Free host memory free(h_A); free(h_B); free(h_C); printf("Donen"); return 0; }
  • 13. 4.Inference app in Android phone 4.1 Install NDK in Android Studio ANN model based inference engine is coded in C++ and C. To make a library by using inference engine code, there is a need to have NDK also installed in Android studio. In the following installation of Android studio and also installation of NDK are given in detail. Following is installed in Ubuntu 18.04 x86 machine. Installation of dependencies Android Studio requires OpenJDK version 8 or above to be installed to your system sudo apt update sudo apt install openjdk-8-jdk java -version Install Android Studio sudo snap install android-studio --classic start Android Studio either by typing android-studio in your terminal or by clicking on the Android Studio icon (Activities -> Android Studio). SDK required version Use​ SDK 22 or above. What is used in present build of J722 version of App is ​SDK 29 Install NDK Use SDK manager to install the following components ( NDK). these components useful to build JNI part of Inference engine Packages to install: - ​LLDB 3.1 (lldb;3.1) - CMake 3.10.2.4988404 (cmake;3.10.2.4988404) - NDK (Side by side) 20.1.5948944 (ndk;20.1.5948944) Reference: https://linuxize.com/post/how-to-install-android-studio-on-ubuntu-18-04/
  • 14. 4.2 Build Inference Engine Using Android studio build Inference engine as given in the following workflow and copy the same APK into Android phone . Windows 10 or Ubuntu 18.04 with Android Studio ( the latest stable version of Android Studio is version 3.3.1.0.) is used. NDK (Side by side) 20.1.5948944 (ndk;20.1.5948944) is used to handle JNI side lib creation for Inference Engine, Inference engine full source code is given in “ ​https://github.com/DLinIoTedge/NN​ “ Following diagram provides information on workflow to create J722 application in the form of APK that can run in Android phone.
  • 15. 5.Update Inference Engine with Trained Model Model update application source code is given in this URL https://github.com/DLinIoTedge/Send2Phone Following IP network configuration is recommended Send2Phone source code in JAVA and it is running in Power machine or in X86 machine. Question: How to build toPhone/SndModel.jar ? javac main.java Question: How to use toPhone/SndModel.jar ? java -jar toPhone/SndModel.jar
  • 16. And also follow the above given workflow in Host CPU and also in Android Phone. Successfully operations in the above result in “deployment of trained model” in Inference engine that runs in Android smartphone.
  • 17. 6. Working with ML in Watson Studio Details are given in the following link ai-ml-watson.pdf 7. Visual Recognition in Watson Studio Details are given in the following link A beginner's guide to setting up a visual recognition service 8. Deploy Visual Recognition // following worked curl -X POST -u "apikey:​put your API key here​" -F "images_file=@sasi.jpeg" -F "threshold=0.6" -F "classifier_ids=DefaultCustomModel_1738357304" "https://gateway.watsonplatform.net/visual-recognition/api/v3/classify?version=2018-03-19" { "images": [ { "classifiers": [ { "classifier_id": "DefaultCustomModel_1738357304", "name": "Default Custom Model", "classes": [ { "class": "lb1compress.zip", "score": 0.855
  • 18. } ] } ], "image": "sasi.jpeg" } ], "images_processed": 1, "custom_classes": 3 } Sample code for Client Application …………………………. Pyrhan code …………… pip pip install --upgrade "watson-developer-cloud>=2.1.1" Authentication from watson_developer_cloud import VisualRecognitionV3 visual_recognition = VisualRecognitionV3( version='{version}', iam_apikey='{iam_api_key}' ) Authentication (for instances created before May 23, 2018) from watson_developer_cloud import VisualRecognitionV3 visual_recognition = VisualRecognitionV3( version='{version}', api_key='{api_key}' ) Classify an image import json from watson_developer_cloud import VisualRecognitionV3 visual_recognition = VisualRecognitionV3( '2018-03-19', iam_apikey='{iam_api_key}') with open('./fruitbowl.jpg', 'rb') as images_file: classes = visual_recognition.classify( images_file, threshold='0.6', classifier_ids='DefaultCustomModel_967878440').get_result() print(json.dumps(classes, indent=2)) …………………………………………………………
  • 19. 9. Visual Recognition Client in Android Phone Details are given in the following link VR ///////////////////////////////////////////// ​document ends​ //////////////////////////////////////////////////////