SlideShare a Scribd company logo
1 of 76
CUDA PARELLEL
PROGRAMMING AND
GAMING MEETUP
RIGA, LATVIA, JANUARY 12, 2015
Rihards Gailums
Twitter: @RihardsGailums
rihards.gailums@rhtu.edu.lv
• Personal Introductions;
• CES Nvidia news;
• Jetson TK1;
• Anouncements;
• Summary
Agenda
Who am I
What I am doing
What is my parallel programming
experience [CUDA] experience
Introduction
Exponential Technologies
Drones AUTOMOTIVE 3D Printing Artificial intelligence
Biotechnology Design ROBOTICS ComputerVision
CES 2015
Parrot drone + Jetson TK1
Featured events
The Startup
University
Latvia Tech and
Entrepreneurship
meetup network
Riga BioTechnology Meetup
UI/UX Riga Meetup
Riga Drone Meetup
Riga Mobile App Developer
Meetup
Kick-off Meetup
December 12, THE Mill
December Meetup
December 15, THE Mill
Drone Kick-off Meetup
December 16, THE Mill
Kick-off Meetup
December 17, THE Mill
Riga Startup: Idea to IPO
CUDA parallel programming and
gaming meetup Riga
3D Printing Riga
Meetup
Bitcoin and Cryptocurrencies
Meetup
Find you Co-founder
December 18, THE Mill
December CUDA Meetup
December 22, THE Mill
January 3D Printing
January 8, RTU Design
Factory
JanuARY Meetup
jANUARy 26, THE Mill
Sponsors
Technology and Entrepreneurship education
Paralell computing
Research Center
CUDA LABORATORY
Jetson TK1
http://www.anandtech.com/show/7905/nvidia-announces-jetson-tk1-dev-board-adds-erista-to-tegra-roadmap
https://www.youtube.com/watch?v=aw01HwTN1MM
Whats in the Box
Login Credentials
Username: ubuntu
Password: ubuntu
Install the NVIDIA Linux driver binary release on your target located in:
${HOME}/NVIDIA-INSTALLER
Step 1)
Change directories into the NVIDIA installation directory:
cd ${HOME}/NVIDIA-INSTALLER
Step 2)
Run the installer script to extract and install the Linux driver binary release:
sudo ./installer.sh
Step 3)
Reboot the system to have the graphical desktop UI come up.
CUDA SDK Demo Samples
• Particles
• Nbody
• Smokeparticles
• waves
http://elinux.org/Jetson_TK1
Jetson/Installing CUDA
http://elinux.org/Jetson/Installing_CUDA
Jetson/Installing OpenCV
http://elinux.org/Jetson/Installing_OpenCV
Open CV
http://docs.opencv.org/doc/tutorials/tutorials.html
http://docs.opencv.org/doc/tutorials/objdetect/cascade_classif
ier/cascade_classifier.html#cascade-classifier
Object Detection
Jetson TEGRA TK1
Tegra K1 SOC
• Kepler GPU with 192 CUDA cores
• 4-Plus-1 quad-core ARM Cortex A15 CPU
• 2 GB x16 memory with 64 bit width
• 16 GB 4.51 eMMC memory
• 1 Half mini-PCIE slot
• 1 Full size SD/MMC connector
• 1 Full-size HDMI port
• 1 USB 2.0 port, micro AB
• 1 USB 3.0 port, A
• 1 RS232 serial port
• 1 ALC5639 Realtek Audio codec with Mic in and Line out
• 1 RTL8111GS Realtek GigE LAN
• 1 SATA data port
• SPI 4MByte boot flash
Research projects?
• IT industryexperiencesanincreasinggrowth for displaysurfaceswith high resolution
• Usecasesfor suchsurfacesincludesatellite and map data,x-rayand microscopeimages,multimedia,CCTV,etc.
• Existing solutions arenot scalable, do notoffer hardwareabstraction,suffer fromwiring limitations
Proposed Virtual Machine Based Monitor Wall Architecture
Introduction Scalability
Conclusions
• The current experiments show that this architecture is very feasible
for non FPS intensiveusecases where the displaywall can bedriven
byasingle physicalGPU
• The total resolution provided by this architecture even using the
currently available compression technology greatly exceeds the
resolutions of existing solutions, it would be expected for the
resolutionto grow inthe future
• The architecture itself scales very good, it is limited mainly by OS
support for multiple monitors (this can be overcome by simulating
a single high resolution display in the virtual machine that spans the
whole resolution of the physical wall) and the possibility to stack
multiple GPU’sin thehostsystem
• Future work should focus on the ability to virtualize OpenGL and
Direct3Dto removethe advantages ofnon-virtualized architectures
OS GPU
GPU
Monit
or
Monit
or
Monit
or
Monit
or
OS GPU
Monit
or
Monit
or
Monit
or
Monit
or
Splitter /
Scaler
Currentlythere aretwo mainalternatives to beused asthe H.264encoderin this architecture –Intel Quick Syncand NVENC.NVCENCis morefeasiblebecause:
• Thetotal encoding powercanbeincreasedbystacking up multiple GPUsthat supportNVENCwithout penalties while notall Intel QuickSyncGPUshavebuilt in video memory so scaling thesecardsintroduceaperformancepenalty
of using systemmemory
• NVENCdoes notput anylimitations onother componentsofthe system,while Intel Quick Syncsupportsalimited amountofCPUs
• Currentbenchmarksseemto showthat the overallFPS performancefor asingle GPU (whichisthe main criteriafor this architecture)is better for NVENCthan Intel QuickSync
Why NVENC?
Pro:OScannatively managethe displays
Con: Powerconsumption,supportedmonitorcountlimited bythe output countof theGPUsand
expansionslots for theGPUsonthe motherboard,deploymentislimited bywiring
Pro:Softwarecomplexityisreducedsinceit doesnot haveto bemultiple monitor aware
Con: SmallresolutionandDPI, visualization is notdisplayed in it’s nativeresolution
Con: Expensive
Currently Popular Monitor Wall Architectures
Pro:Scalable,hostmachinecanrunmultiplevirtual machines,multiple
virtualized GPU’s mapto physicalGPU’s to maximizeefficiency
Pro:LANconnectionto thedisplaywall removeswire length
limitations forcedbyDVI/HDMI cables
Pro:Total resolutionof thewall goes beyondtheones that canbe
achievedusing physical hardware
Con: Lossycompression
Con: NoDirect3D,OpenGLsupport
• Thehostmachinecollects the framebuffer datafrom thevirtual machineGPUsand performsH.264 encodingof thevideo stream onthe physicalhost
GPUthus thearchitectureheavilyreliesonafast hardwareH.264encoderallowing thehosted virtual machinesto fully usetheCPU
• NonFPS intensiveusecasesallow agreatnumberof virtual monitors to behosted onasingle physicalGPU thus reducingthe power consumption
0
50
100
Using Video…
Maximum
Thegraphbelowdemonstratesthescalability possibilities in termsof
possiblemaximalamountof connectedmonitors for thetraditional
architectureversustheproposedoneonaQuadroK4000 cardthat has 4
outputs.
Displaywallarchitecturewhereeachoutput of theGPU mapsto atileon thedisplaywall Displaywallarchitecturewhereeachoutput of theGPU issplit/upscaled among the tileson thedisplaywall
Host Machine
Virtual MachineG
P
U
G
P
U
G
P
U
G
P
UG
P
U
G
P
U
G
P
U
G
P
UG
P
U
G
P
U
G
P
U
G
P
UG
P
U
G
P
U
G
P
U
G
P
U
H.264/RTP/LAN
GPU
Theproposed displaywallarchitecturewhereeachoutput of avirtualGPU mapsto a
tileonthedisplaywallandistransmittedasaH.264streamoverLAN
Virtualmachinebasedmonitor wallrunningGooglemapsinsideChromewebbrowser on16tilesat1920x1080pixelseachgivingatotal
resolutionof 32megapixels
Eachtilehasadedicated LANconnectionandH.264decoder
Scalabilityof supported monitor count
NVENC BasedH.264 Encoding forVirtual
Machine BasedMonitorWall Architecture
R.Bundulis(rudolfs.bundulis@lu.lv),G.Arnicans (guntis.arnicans@lu.lv), and R.Gailums (rihards.gailums@rhtu.edu.lv)
UniversityofLatvia/RigaHighTechUniversity,Latvia
For Startups by Meetup members:
$1800 per year of FREE Azure cloud services
Free Microsoft software and tools
Latvijas Garantiju Aģentūra
Government corporations which supports Latvian entepreneurs
and helps in realisation of their business ideas.
Featured Resources
Featured Startup
Speakers
Mobile APP Startup pitch
Why GPU Computing
GPUCPU
Add GPUs: Accelerate Science Applications
© NVIDIA 2013
Small Changes, Big Speed-up
Application Code
+
GPU CPU
Use GPU to
Parallelize
Compute-Intensive
Functions
Rest of Sequential
CPU Code
© NVIDIA 2013
Fastest Performance on Scientific Applications
Tesla K20X Speed-Up over Sandy Bridge CPUs
CPU results: Dual socket E5-2687w, 3.10 GHz, GPU results: Dual socket E5-2687w + 2 Tesla K20X GPUs
*MATLAB results comparing one i7-2600K CPU vs with Tesla K20 GPU
Disclaimer: Non-NVIDIA implementations may not have been fully optimized
0.0x 5.0x 10.0x 15.0x 20.0x
AMBER
SPECFEM3D
Chroma
MATLAB (FFT)*Engineering
Earth
Science
Physics
Molecular
Dynamics
© NVIDIA 2013
Why Computing Perf/Watt Matters?
Traditional CPUs are
not economically feasible
2.3 PFlops 7000 homes
7.0
Megawatts
7.0
Megawatts
CPU
Optimized for
Serial Tasks
GPU Accelerator
Optimized for Many
Parallel Tasks
10x performance/socket
> 5x energy efficiency
Era of GPU-accelerated
computing is here
© NVIDIA 2013
World’s Fastest, Most Energy Efficient Accelerator
Tesla K20X
Tesla K20
Xeon CPU,
E5-2690
Xeon Phi
225W
0.0
1.0
2.0
3.0
0.0 0.5 1.0 1.5
SGEMM(TFLOPS)
DGEMM (TFLOPS)
Tesla K20X vs Xeon CPU
8x Faster SGEMM
6x Faster DGEMM
Tesla K20X vs Xeon Phi
90% Faster SGEMM
60% Faster DGEMM
© NVIDIA 2013
Introduction to the
CUDA Platform
CUDA Parallel Computing Platform
Hardware
Capabilities
GPUDirectSMX
Dynamic
Parallelism
HyperQ
Programming
Approaches
Libraries
“Drop-in”
Acceleration
Programming
Languages
OpenACC
Directives
Maximum Flexibility
Easily Accelerate
Apps
Development
Environment
Nsight IDE
Linux, Mac and Windows
GPU Debugging and
Profiling
CUDA-GDB
debugger
NVIDIA Visual
Profiler
Open Compiler
Tool Chain
Enables compiling new languages to CUDA
platform, and CUDA languages to other
architectures
www.nvidia.com/getcuda
© NVIDIA 2013
Applications
Libraries
“Drop-in”
Acceleration
Programming
Languages
OpenACC
Directives
Easily Accelerate
Applications
3 Ways to Accelerate Applications
Maximum
Flexibility
© NVIDIA 2013
3 Ways to Accelerate Applications
Applications
Libraries
“Drop-in”
Acceleration
Programming
Languages
OpenACC
Directives
Maximum
Flexibility
Easily Accelerate
Applications
© NVIDIA 2013
Libraries: Easy, High-Quality
Acceleration
• Ease of use: Using libraries enables GPU acceleration without in-depth
knowledge of GPU programming
• “Drop-in”: Many GPU-accelerated libraries follow standard APIs, thus
enabling acceleration with minimal code changes
• Quality: Libraries offer high-quality implementations of functions
encountered in a broad range of applications
• Performance: NVIDIA libraries are tuned by experts
© NVIDIA 2013
Some GPU-accelerated Libraries
NVIDIA cuBLAS NVIDIA cuRAND NVIDIA cuSPARSE NVIDIA NPP
Vector Signal
Image Processing
GPU Accelerated
Linear Algebra
Matrix Algebra
on GPU and
Multicore
NVIDIA cuFFT
C++ STL
Features for
CUDAIMSL Library
Building-block
Algorithms for
CUDA
ArrayFire Matrix
Computations
Sparse Linear
Algebra
© NVIDIA 2013
3 Steps to CUDA-accelerated
application
• Step 1: Substitute library calls with equivalent CUDA library calls
saxpy ( … ) cublasSaxpy ( … )
• Step 2: Manage data locality
- with CUDA: cudaMalloc(), cudaMemcpy(), etc.
- with CUBLAS: cublasAlloc(), cublasSetVector(), etc.
• Step 3: Rebuild and link the CUDA-accelerated library
nvcc myobj.o –l cublas
© NVIDIA 2013
Explore the CUDA (Libraries) Ecosystem
• CUDA Tools and Ecosystem
described in detail on NVIDIA
Developer Zone:
developer.nvidia.com/cuda-tools-ecosystem
© NVIDIA 2013
3 Ways to Accelerate Applications
Applications
Libraries
“Drop-in”
Acceleration
Programming
Languages
OpenACC
Directives
Maximum
Flexibility
Easily Accelerate
Applications
© NVIDIA 2013
OpenACC Directives
© NVIDIA 2013
Program myscience
... serial code ...
!$acc kernels
do k = 1,n1
do i = 1,n2
... parallel code ...
enddo
enddo
!$acc end kernels
...
End Program myscience
CPU GPU
Your original
Fortran or C
code
Simple Compiler hints
Compiler Parallelizes
code
Works on many-core
GPUs & multicore CPUs
OpenACC
compiler
Hint
• Easy: Directives are the easy path to accelerate
compute intensive applications
• Open: OpenACC is an open GPU directives standard,
making GPU programming straightforward and
portable across parallel and multi-core processors
• Powerful: GPU Directives allow complete access to the
massive parallel power of a GPU
OpenACC
The Standard for GPU Directives
© NVIDIA 2013
Real-Time Object
Detection
Global Manufacturer of
Navigation Systems
Valuation of Stock
Portfolios using Monte
Carlo
Global Technology Consulting
Company
Interaction of Solvents
and Biomolecules
University of Texas at San Antonio
Directives: Easy & Powerful
Optimizing code with directives is quite easy, especially compared to CPU threads or writing
CUDA kernels. The most important thing is avoiding restructuring of existing code for
production applications.
” -- Developer at the Global Manufacturer of
Navigation Systems
“
5x in 40 Hours 2x in 4 Hours 5x in 8 Hours
© NVIDIA 2013
Start Now with OpenACC Directives
Free trial license to PGI
Accelerator
Tools for quick ramp
www.nvidia.com/gpudirectives
Sign up for a free trial of
the directives compiler
now!
© NVIDIA 2013
3 Ways to Accelerate Applications
Applications
Libraries
“Drop-in”
Acceleration
Programming
Languages
OpenACC
Directives
Maximum
Flexibility
Easily Accelerate
Applications
© NVIDIA 2013
GPU Programming Languages
OpenACC, CUDA FortranFortran
OpenACC, CUDA CC
Thrust, CUDA C++C++
PyCUDA, CopperheadPython
Alea.cuBaseF#
MATLAB, Mathematica, LabVIEWNumerical analytics
© NVIDIA 2013
// generate 32M random numbers on host
thrust::host_vector<int> h_vec(32 << 20);
thrust::generate(h_vec.begin(),
h_vec.end(),
rand);
// transfer data to device (GPU)
thrust::device_vector<int> d_vec = h_vec;
// sort data on device
thrust::sort(d_vec.begin(), d_vec.end());
// transfer data back to host
thrust::copy(d_vec.begin(),
d_vec.end(),
h_vec.begin());
Rapid Parallel C++ Development
• Resembles C++ STL
• High-level interface
• Enhances developer
productivity
• Enables performance
portability between GPUs and
multicore CPUs
• Flexible
• CUDA, OpenMP, and TBB
backends
• Extensible and customizable
• Integrates with existing
software
• Open source
http://developer.nvidia.com/thrust or http://thrust.googlecode.com
MATLAB
http://www.mathworks.com/discovery/
matlab-gpu.html
Learn More
These languages are supported on all CUDA-capable GPUs.
You might already have a CUDA-capable GPU in your laptop
or desktop PC!
CUDA C/C++
http://developer.nvidia.com/cuda-toolkit
Thrust C++ Template Library
http://developer.nvidia.com/thrust
CUDA Fortran
http://developer.nvidia.com/cuda-toolkit
GPU.NET
http://tidepowerd.com
PyCUDA (Python)
http://mathema.tician.de/software/pycuda
Mathematica
http://www.wolfram.com/mathematica/new
-in-8/cuda-and-opencl-support/
© NVIDIA 2013
Getting Started
© NVIDIA 2013
• Download CUDA Toolkit & SDK: www.nvidia.com/getcuda
• Nsight IDE (Eclipse or Visual Studio): www.nvidia.com/nsight
• Programming Guide/Best Practices:
• docs.nvidia.com
• Questions:
• NVIDIA Developer forums: devtalk.nvidia.com
• Search or ask on: www.stackoverflow.com/tags/cuda
• General: www.nvidia.com/cudazone
© NVIDIA 2013
https://www.youtube.com/watch?v=IzU4AVcMFys
Intro to CUDA - An introduction, how-to, to NVIDIA's GPU parallel
Mythbusters Demo GPU versus CPU: http://youtu.be/-P28LKWTzrI
Research projects?
Jetson TEGRA TK1
Tegra K1 SOC
• Kepler GPU with 192 CUDA cores
• 4-Plus-1 quad-core ARM Cortex A15 CPU
• 2 GB x16 memory with 64 bit width
• 16 GB 4.51 eMMC memory
• 1 Half mini-PCIE slot
• 1 Full size SD/MMC connector
• 1 Full-size HDMI port
• 1 USB 2.0 port, micro AB
• 1 USB 3.0 port, A
• 1 RS232 serial port
• 1 ALC5639 Realtek Audio codec with Mic in and Line out
• 1 RTL8111GS Realtek GigE LAN
• 1 SATA data port
• SPI 4MByte boot flash
NVIDIA GTX 750Ti
• Nvidia MAXWELL technology
• Cost – 170 USD
• Only 60 W of power, no dedicated power connections
• 250 MHash/sek
Vs
• Nvidia GTX 780 – 350 MHash/sek + Power cosumption
• Nvidia TESLA K40 – 560 MHash/sek + Power cosumption
Latvian CUDA & parallel programming
ecosystem
Next meetups, frequency
Speakers
Topics
Group marketing channels

More Related Content

What's hot

Embedded and Reliable Computer Vision
Embedded and Reliable Computer VisionEmbedded and Reliable Computer Vision
Embedded and Reliable Computer Vision
NVIDIA Taiwan
 
PG-4039, RapidFire API, by Dmitry Kozlov
PG-4039, RapidFire API, by Dmitry KozlovPG-4039, RapidFire API, by Dmitry Kozlov
PG-4039, RapidFire API, by Dmitry Kozlov
AMD Developer Central
 
Tegra 4i expands the market
Tegra 4i expands the marketTegra 4i expands the market
Tegra 4i expands the market
Brian Caulfield
 

What's hot (20)

NVIDIA 深度學習教育機構 (DLI): Neural network deployment
NVIDIA 深度學習教育機構 (DLI): Neural network deploymentNVIDIA 深度學習教育機構 (DLI): Neural network deployment
NVIDIA 深度學習教育機構 (DLI): Neural network deployment
 
Embedded and Reliable Computer Vision
Embedded and Reliable Computer VisionEmbedded and Reliable Computer Vision
Embedded and Reliable Computer Vision
 
1030: NVIDIA GRID 2.0
1030: NVIDIA GRID 2.01030: NVIDIA GRID 2.0
1030: NVIDIA GRID 2.0
 
Qualcomm Hexagon SDK: Optimize Your Multimedia Solutions
Qualcomm Hexagon SDK: Optimize Your Multimedia SolutionsQualcomm Hexagon SDK: Optimize Your Multimedia Solutions
Qualcomm Hexagon SDK: Optimize Your Multimedia Solutions
 
PG-4039, RapidFire API, by Dmitry Kozlov
PG-4039, RapidFire API, by Dmitry KozlovPG-4039, RapidFire API, by Dmitry Kozlov
PG-4039, RapidFire API, by Dmitry Kozlov
 
全面保護企業的關鍵智慧資產
全面保護企業的關鍵智慧資產全面保護企業的關鍵智慧資產
全面保護企業的關鍵智慧資產
 
Part 3 Maximizing the utilization of GPU resources on-premise and in the cloud
Part 3 Maximizing the utilization of GPU resources on-premise and in the cloudPart 3 Maximizing the utilization of GPU resources on-premise and in the cloud
Part 3 Maximizing the utilization of GPU resources on-premise and in the cloud
 
Tegra 4i expands the market
Tegra 4i expands the marketTegra 4i expands the market
Tegra 4i expands the market
 
NVIDIA PRO VR DAY 2017 基調講演
NVIDIA PRO VR DAY 2017 基調講演NVIDIA PRO VR DAY 2017 基調講演
NVIDIA PRO VR DAY 2017 基調講演
 
Accelerated Computing: The Path Forward
Accelerated Computing: The Path ForwardAccelerated Computing: The Path Forward
Accelerated Computing: The Path Forward
 
Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...
Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...
Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...
 
GTC 2017 オートモーティブ最新情報
GTC 2017 オートモーティブ最新情報GTC 2017 オートモーティブ最新情報
GTC 2017 オートモーティブ最新情報
 
HSA-4146, Creating Smarter Applications and Systems Through Visual Intelligen...
HSA-4146, Creating Smarter Applications and Systems Through Visual Intelligen...HSA-4146, Creating Smarter Applications and Systems Through Visual Intelligen...
HSA-4146, Creating Smarter Applications and Systems Through Visual Intelligen...
 
Streamed Cloud Gaming Solutions for Android* and PC Games
Streamed Cloud Gaming Solutions for Android* and PC GamesStreamed Cloud Gaming Solutions for Android* and PC Games
Streamed Cloud Gaming Solutions for Android* and PC Games
 
Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...
Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...
Build a Deep Learning Video Analytics Framework | SIGGRAPH 2019 Technical Ses...
 
Keynote (Phil Rogers) - The Programmers Guide to Reaching for the Cloud - by ...
Keynote (Phil Rogers) - The Programmers Guide to Reaching for the Cloud - by ...Keynote (Phil Rogers) - The Programmers Guide to Reaching for the Cloud - by ...
Keynote (Phil Rogers) - The Programmers Guide to Reaching for the Cloud - by ...
 
How to Choose Mobile Workstation? VR Ready
How to Choose Mobile Workstation? VR ReadyHow to Choose Mobile Workstation? VR Ready
How to Choose Mobile Workstation? VR Ready
 
Breaking New Frontiers in Robotics and Edge Computing with AI
Breaking New Frontiers in Robotics and Edge Computing with AIBreaking New Frontiers in Robotics and Edge Computing with AI
Breaking New Frontiers in Robotics and Edge Computing with AI
 
Data Science Week 2016. NVIDIA. "Платформы и инструменты для реализации систе...
Data Science Week 2016. NVIDIA. "Платформы и инструменты для реализации систе...Data Science Week 2016. NVIDIA. "Платформы и инструменты для реализации систе...
Data Science Week 2016. NVIDIA. "Платформы и инструменты для реализации систе...
 
Jetson AGX Xavier and the New Era of Autonomous Machines
Jetson AGX Xavier and the New Era of Autonomous MachinesJetson AGX Xavier and the New Era of Autonomous Machines
Jetson AGX Xavier and the New Era of Autonomous Machines
 

Viewers also liked

Revista Culturism nr.108 (5/2000)
Revista Culturism nr.108 (5/2000)Revista Culturism nr.108 (5/2000)
Revista Culturism nr.108 (5/2000)
Redis Nutritie
 

Viewers also liked (7)

Sistema evaluacion curso
Sistema evaluacion cursoSistema evaluacion curso
Sistema evaluacion curso
 
Nerd valentine
Nerd valentineNerd valentine
Nerd valentine
 
Revista Culturism nr.108 (5/2000)
Revista Culturism nr.108 (5/2000)Revista Culturism nr.108 (5/2000)
Revista Culturism nr.108 (5/2000)
 
Uncovering the Elusive HIV Capsid with Kepler GPUs Running NAMD and VMD
Uncovering the Elusive HIV Capsid with Kepler GPUs Running NAMD and VMDUncovering the Elusive HIV Capsid with Kepler GPUs Running NAMD and VMD
Uncovering the Elusive HIV Capsid with Kepler GPUs Running NAMD and VMD
 
Boletín Tierra nº 218.- 31/marzo/2014
Boletín Tierra nº 218.- 31/marzo/2014Boletín Tierra nº 218.- 31/marzo/2014
Boletín Tierra nº 218.- 31/marzo/2014
 
130 cdm 22_cdm22
130 cdm 22_cdm22130 cdm 22_cdm22
130 cdm 22_cdm22
 
Pauta de diagramación formato El Peruano
Pauta de diagramación formato El PeruanoPauta de diagramación formato El Peruano
Pauta de diagramación formato El Peruano
 

Similar to Cuda meetup presentation 5

TECHNICAL PAPER PRESENTATION copy
TECHNICAL PAPER PRESENTATION copyTECHNICAL PAPER PRESENTATION copy
TECHNICAL PAPER PRESENTATION copy
Bhargav Ramesh
 

Similar to Cuda meetup presentation 5 (20)

Cuda
CudaCuda
Cuda
 
Accelerated SDN in Azure
Accelerated SDN in AzureAccelerated SDN in Azure
Accelerated SDN in Azure
 
Introduction to Software Defined Visualization (SDVis)
Introduction to Software Defined Visualization (SDVis)Introduction to Software Defined Visualization (SDVis)
Introduction to Software Defined Visualization (SDVis)
 
Hai Tao at AI Frontiers: Deep Learning For Embedded Vision System
Hai Tao at AI Frontiers: Deep Learning For Embedded Vision SystemHai Tao at AI Frontiers: Deep Learning For Embedded Vision System
Hai Tao at AI Frontiers: Deep Learning For Embedded Vision System
 
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
 
Webinar: NVIDIA JETSON – A Inteligência Artificial na palma de sua mão
Webinar: NVIDIA JETSON – A Inteligência Artificial na palma de sua mãoWebinar: NVIDIA JETSON – A Inteligência Artificial na palma de sua mão
Webinar: NVIDIA JETSON – A Inteligência Artificial na palma de sua mão
 
System Level Solutions (SLS) Introduction
System Level Solutions (SLS) IntroductionSystem Level Solutions (SLS) Introduction
System Level Solutions (SLS) Introduction
 
Accelerating Data Science With GPUs
Accelerating Data Science With GPUsAccelerating Data Science With GPUs
Accelerating Data Science With GPUs
 
EPSRC CDT Conference
EPSRC CDT ConferenceEPSRC CDT Conference
EPSRC CDT Conference
 
NVIDIA DGX-1 超級電腦與人工智慧及深度學習
NVIDIA DGX-1 超級電腦與人工智慧及深度學習NVIDIA DGX-1 超級電腦與人工智慧及深度學習
NVIDIA DGX-1 超級電腦與人工智慧及深度學習
 
Fórum E-Commerce Brasil | Tecnologias NVIDIA aplicadas ao e-commerce. Muito a...
Fórum E-Commerce Brasil | Tecnologias NVIDIA aplicadas ao e-commerce. Muito a...Fórum E-Commerce Brasil | Tecnologias NVIDIA aplicadas ao e-commerce. Muito a...
Fórum E-Commerce Brasil | Tecnologias NVIDIA aplicadas ao e-commerce. Muito a...
 
The Visual Computing Company
The Visual Computing CompanyThe Visual Computing Company
The Visual Computing Company
 
HKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening KeynoteHKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening Keynote
 
E3MV - Embedded Vision - Sundance
E3MV - Embedded Vision - SundanceE3MV - Embedded Vision - Sundance
E3MV - Embedded Vision - Sundance
 
LEGaTO Heterogeneous Hardware
LEGaTO Heterogeneous HardwareLEGaTO Heterogeneous Hardware
LEGaTO Heterogeneous Hardware
 
Accelerating Innovation from Edge to Cloud
Accelerating Innovation from Edge to CloudAccelerating Innovation from Edge to Cloud
Accelerating Innovation from Edge to Cloud
 
VEDLIoT at FPL'23_Accelerators for Heterogenous Computing in AIoT
VEDLIoT at FPL'23_Accelerators for Heterogenous Computing in AIoTVEDLIoT at FPL'23_Accelerators for Heterogenous Computing in AIoT
VEDLIoT at FPL'23_Accelerators for Heterogenous Computing in AIoT
 
TECHNICAL PAPER PRESENTATION copy
TECHNICAL PAPER PRESENTATION copyTECHNICAL PAPER PRESENTATION copy
TECHNICAL PAPER PRESENTATION copy
 
NVIDIA vGPU - Introduction to NVIDIA Virtual GPU
NVIDIA vGPU - Introduction to NVIDIA Virtual GPUNVIDIA vGPU - Introduction to NVIDIA Virtual GPU
NVIDIA vGPU - Introduction to NVIDIA Virtual GPU
 
Microsoft Project Olympus AI Accelerator Chassis (HGX-1)
Microsoft Project Olympus AI Accelerator Chassis (HGX-1)Microsoft Project Olympus AI Accelerator Chassis (HGX-1)
Microsoft Project Olympus AI Accelerator Chassis (HGX-1)
 

Recently uploaded

1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
QucHHunhnh
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
SoniaTolstoy
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
PECB
 

Recently uploaded (20)

9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 

Cuda meetup presentation 5

  • 1. CUDA PARELLEL PROGRAMMING AND GAMING MEETUP RIGA, LATVIA, JANUARY 12, 2015
  • 3. • Personal Introductions; • CES Nvidia news; • Jetson TK1; • Anouncements; • Summary Agenda
  • 4. Who am I What I am doing What is my parallel programming experience [CUDA] experience Introduction
  • 5. Exponential Technologies Drones AUTOMOTIVE 3D Printing Artificial intelligence Biotechnology Design ROBOTICS ComputerVision
  • 7.
  • 8. Parrot drone + Jetson TK1
  • 9.
  • 10.
  • 11.
  • 13.
  • 14. The Startup University Latvia Tech and Entrepreneurship meetup network
  • 15. Riga BioTechnology Meetup UI/UX Riga Meetup Riga Drone Meetup Riga Mobile App Developer Meetup Kick-off Meetup December 12, THE Mill December Meetup December 15, THE Mill Drone Kick-off Meetup December 16, THE Mill Kick-off Meetup December 17, THE Mill
  • 16. Riga Startup: Idea to IPO CUDA parallel programming and gaming meetup Riga 3D Printing Riga Meetup Bitcoin and Cryptocurrencies Meetup Find you Co-founder December 18, THE Mill December CUDA Meetup December 22, THE Mill January 3D Printing January 8, RTU Design Factory JanuARY Meetup jANUARy 26, THE Mill
  • 21.
  • 22.
  • 26.
  • 28. Install the NVIDIA Linux driver binary release on your target located in: ${HOME}/NVIDIA-INSTALLER Step 1) Change directories into the NVIDIA installation directory: cd ${HOME}/NVIDIA-INSTALLER Step 2) Run the installer script to extract and install the Linux driver binary release: sudo ./installer.sh Step 3) Reboot the system to have the graphical desktop UI come up.
  • 29. CUDA SDK Demo Samples • Particles • Nbody • Smokeparticles • waves
  • 30.
  • 32.
  • 37. Jetson TEGRA TK1 Tegra K1 SOC • Kepler GPU with 192 CUDA cores • 4-Plus-1 quad-core ARM Cortex A15 CPU • 2 GB x16 memory with 64 bit width • 16 GB 4.51 eMMC memory • 1 Half mini-PCIE slot • 1 Full size SD/MMC connector • 1 Full-size HDMI port • 1 USB 2.0 port, micro AB • 1 USB 3.0 port, A • 1 RS232 serial port • 1 ALC5639 Realtek Audio codec with Mic in and Line out • 1 RTL8111GS Realtek GigE LAN • 1 SATA data port • SPI 4MByte boot flash
  • 39. • IT industryexperiencesanincreasinggrowth for displaysurfaceswith high resolution • Usecasesfor suchsurfacesincludesatellite and map data,x-rayand microscopeimages,multimedia,CCTV,etc. • Existing solutions arenot scalable, do notoffer hardwareabstraction,suffer fromwiring limitations Proposed Virtual Machine Based Monitor Wall Architecture Introduction Scalability Conclusions • The current experiments show that this architecture is very feasible for non FPS intensiveusecases where the displaywall can bedriven byasingle physicalGPU • The total resolution provided by this architecture even using the currently available compression technology greatly exceeds the resolutions of existing solutions, it would be expected for the resolutionto grow inthe future • The architecture itself scales very good, it is limited mainly by OS support for multiple monitors (this can be overcome by simulating a single high resolution display in the virtual machine that spans the whole resolution of the physical wall) and the possibility to stack multiple GPU’sin thehostsystem • Future work should focus on the ability to virtualize OpenGL and Direct3Dto removethe advantages ofnon-virtualized architectures OS GPU GPU Monit or Monit or Monit or Monit or OS GPU Monit or Monit or Monit or Monit or Splitter / Scaler Currentlythere aretwo mainalternatives to beused asthe H.264encoderin this architecture –Intel Quick Syncand NVENC.NVCENCis morefeasiblebecause: • Thetotal encoding powercanbeincreasedbystacking up multiple GPUsthat supportNVENCwithout penalties while notall Intel QuickSyncGPUshavebuilt in video memory so scaling thesecardsintroduceaperformancepenalty of using systemmemory • NVENCdoes notput anylimitations onother componentsofthe system,while Intel Quick Syncsupportsalimited amountofCPUs • Currentbenchmarksseemto showthat the overallFPS performancefor asingle GPU (whichisthe main criteriafor this architecture)is better for NVENCthan Intel QuickSync Why NVENC? Pro:OScannatively managethe displays Con: Powerconsumption,supportedmonitorcountlimited bythe output countof theGPUsand expansionslots for theGPUsonthe motherboard,deploymentislimited bywiring Pro:Softwarecomplexityisreducedsinceit doesnot haveto bemultiple monitor aware Con: SmallresolutionandDPI, visualization is notdisplayed in it’s nativeresolution Con: Expensive Currently Popular Monitor Wall Architectures Pro:Scalable,hostmachinecanrunmultiplevirtual machines,multiple virtualized GPU’s mapto physicalGPU’s to maximizeefficiency Pro:LANconnectionto thedisplaywall removeswire length limitations forcedbyDVI/HDMI cables Pro:Total resolutionof thewall goes beyondtheones that canbe achievedusing physical hardware Con: Lossycompression Con: NoDirect3D,OpenGLsupport • Thehostmachinecollects the framebuffer datafrom thevirtual machineGPUsand performsH.264 encodingof thevideo stream onthe physicalhost GPUthus thearchitectureheavilyreliesonafast hardwareH.264encoderallowing thehosted virtual machinesto fully usetheCPU • NonFPS intensiveusecasesallow agreatnumberof virtual monitors to behosted onasingle physicalGPU thus reducingthe power consumption 0 50 100 Using Video… Maximum Thegraphbelowdemonstratesthescalability possibilities in termsof possiblemaximalamountof connectedmonitors for thetraditional architectureversustheproposedoneonaQuadroK4000 cardthat has 4 outputs. Displaywallarchitecturewhereeachoutput of theGPU mapsto atileon thedisplaywall Displaywallarchitecturewhereeachoutput of theGPU issplit/upscaled among the tileson thedisplaywall Host Machine Virtual MachineG P U G P U G P U G P UG P U G P U G P U G P UG P U G P U G P U G P UG P U G P U G P U G P U H.264/RTP/LAN GPU Theproposed displaywallarchitecturewhereeachoutput of avirtualGPU mapsto a tileonthedisplaywallandistransmittedasaH.264streamoverLAN Virtualmachinebasedmonitor wallrunningGooglemapsinsideChromewebbrowser on16tilesat1920x1080pixelseachgivingatotal resolutionof 32megapixels Eachtilehasadedicated LANconnectionandH.264decoder Scalabilityof supported monitor count NVENC BasedH.264 Encoding forVirtual Machine BasedMonitorWall Architecture R.Bundulis(rudolfs.bundulis@lu.lv),G.Arnicans (guntis.arnicans@lu.lv), and R.Gailums (rihards.gailums@rhtu.edu.lv) UniversityofLatvia/RigaHighTechUniversity,Latvia
  • 40. For Startups by Meetup members: $1800 per year of FREE Azure cloud services Free Microsoft software and tools
  • 41. Latvijas Garantiju Aģentūra Government corporations which supports Latvian entepreneurs and helps in realisation of their business ideas.
  • 47. GPUCPU Add GPUs: Accelerate Science Applications © NVIDIA 2013
  • 48. Small Changes, Big Speed-up Application Code + GPU CPU Use GPU to Parallelize Compute-Intensive Functions Rest of Sequential CPU Code © NVIDIA 2013
  • 49. Fastest Performance on Scientific Applications Tesla K20X Speed-Up over Sandy Bridge CPUs CPU results: Dual socket E5-2687w, 3.10 GHz, GPU results: Dual socket E5-2687w + 2 Tesla K20X GPUs *MATLAB results comparing one i7-2600K CPU vs with Tesla K20 GPU Disclaimer: Non-NVIDIA implementations may not have been fully optimized 0.0x 5.0x 10.0x 15.0x 20.0x AMBER SPECFEM3D Chroma MATLAB (FFT)*Engineering Earth Science Physics Molecular Dynamics © NVIDIA 2013
  • 50. Why Computing Perf/Watt Matters? Traditional CPUs are not economically feasible 2.3 PFlops 7000 homes 7.0 Megawatts 7.0 Megawatts CPU Optimized for Serial Tasks GPU Accelerator Optimized for Many Parallel Tasks 10x performance/socket > 5x energy efficiency Era of GPU-accelerated computing is here © NVIDIA 2013
  • 51. World’s Fastest, Most Energy Efficient Accelerator Tesla K20X Tesla K20 Xeon CPU, E5-2690 Xeon Phi 225W 0.0 1.0 2.0 3.0 0.0 0.5 1.0 1.5 SGEMM(TFLOPS) DGEMM (TFLOPS) Tesla K20X vs Xeon CPU 8x Faster SGEMM 6x Faster DGEMM Tesla K20X vs Xeon Phi 90% Faster SGEMM 60% Faster DGEMM © NVIDIA 2013
  • 53. CUDA Parallel Computing Platform Hardware Capabilities GPUDirectSMX Dynamic Parallelism HyperQ Programming Approaches Libraries “Drop-in” Acceleration Programming Languages OpenACC Directives Maximum Flexibility Easily Accelerate Apps Development Environment Nsight IDE Linux, Mac and Windows GPU Debugging and Profiling CUDA-GDB debugger NVIDIA Visual Profiler Open Compiler Tool Chain Enables compiling new languages to CUDA platform, and CUDA languages to other architectures www.nvidia.com/getcuda © NVIDIA 2013
  • 55. 3 Ways to Accelerate Applications Applications Libraries “Drop-in” Acceleration Programming Languages OpenACC Directives Maximum Flexibility Easily Accelerate Applications © NVIDIA 2013
  • 56. Libraries: Easy, High-Quality Acceleration • Ease of use: Using libraries enables GPU acceleration without in-depth knowledge of GPU programming • “Drop-in”: Many GPU-accelerated libraries follow standard APIs, thus enabling acceleration with minimal code changes • Quality: Libraries offer high-quality implementations of functions encountered in a broad range of applications • Performance: NVIDIA libraries are tuned by experts © NVIDIA 2013
  • 57. Some GPU-accelerated Libraries NVIDIA cuBLAS NVIDIA cuRAND NVIDIA cuSPARSE NVIDIA NPP Vector Signal Image Processing GPU Accelerated Linear Algebra Matrix Algebra on GPU and Multicore NVIDIA cuFFT C++ STL Features for CUDAIMSL Library Building-block Algorithms for CUDA ArrayFire Matrix Computations Sparse Linear Algebra © NVIDIA 2013
  • 58. 3 Steps to CUDA-accelerated application • Step 1: Substitute library calls with equivalent CUDA library calls saxpy ( … ) cublasSaxpy ( … ) • Step 2: Manage data locality - with CUDA: cudaMalloc(), cudaMemcpy(), etc. - with CUBLAS: cublasAlloc(), cublasSetVector(), etc. • Step 3: Rebuild and link the CUDA-accelerated library nvcc myobj.o –l cublas © NVIDIA 2013
  • 59. Explore the CUDA (Libraries) Ecosystem • CUDA Tools and Ecosystem described in detail on NVIDIA Developer Zone: developer.nvidia.com/cuda-tools-ecosystem © NVIDIA 2013
  • 60. 3 Ways to Accelerate Applications Applications Libraries “Drop-in” Acceleration Programming Languages OpenACC Directives Maximum Flexibility Easily Accelerate Applications © NVIDIA 2013
  • 61. OpenACC Directives © NVIDIA 2013 Program myscience ... serial code ... !$acc kernels do k = 1,n1 do i = 1,n2 ... parallel code ... enddo enddo !$acc end kernels ... End Program myscience CPU GPU Your original Fortran or C code Simple Compiler hints Compiler Parallelizes code Works on many-core GPUs & multicore CPUs OpenACC compiler Hint
  • 62. • Easy: Directives are the easy path to accelerate compute intensive applications • Open: OpenACC is an open GPU directives standard, making GPU programming straightforward and portable across parallel and multi-core processors • Powerful: GPU Directives allow complete access to the massive parallel power of a GPU OpenACC The Standard for GPU Directives © NVIDIA 2013
  • 63. Real-Time Object Detection Global Manufacturer of Navigation Systems Valuation of Stock Portfolios using Monte Carlo Global Technology Consulting Company Interaction of Solvents and Biomolecules University of Texas at San Antonio Directives: Easy & Powerful Optimizing code with directives is quite easy, especially compared to CPU threads or writing CUDA kernels. The most important thing is avoiding restructuring of existing code for production applications. ” -- Developer at the Global Manufacturer of Navigation Systems “ 5x in 40 Hours 2x in 4 Hours 5x in 8 Hours © NVIDIA 2013
  • 64. Start Now with OpenACC Directives Free trial license to PGI Accelerator Tools for quick ramp www.nvidia.com/gpudirectives Sign up for a free trial of the directives compiler now! © NVIDIA 2013
  • 65. 3 Ways to Accelerate Applications Applications Libraries “Drop-in” Acceleration Programming Languages OpenACC Directives Maximum Flexibility Easily Accelerate Applications © NVIDIA 2013
  • 66. GPU Programming Languages OpenACC, CUDA FortranFortran OpenACC, CUDA CC Thrust, CUDA C++C++ PyCUDA, CopperheadPython Alea.cuBaseF# MATLAB, Mathematica, LabVIEWNumerical analytics © NVIDIA 2013
  • 67. // generate 32M random numbers on host thrust::host_vector<int> h_vec(32 << 20); thrust::generate(h_vec.begin(), h_vec.end(), rand); // transfer data to device (GPU) thrust::device_vector<int> d_vec = h_vec; // sort data on device thrust::sort(d_vec.begin(), d_vec.end()); // transfer data back to host thrust::copy(d_vec.begin(), d_vec.end(), h_vec.begin()); Rapid Parallel C++ Development • Resembles C++ STL • High-level interface • Enhances developer productivity • Enables performance portability between GPUs and multicore CPUs • Flexible • CUDA, OpenMP, and TBB backends • Extensible and customizable • Integrates with existing software • Open source http://developer.nvidia.com/thrust or http://thrust.googlecode.com
  • 68. MATLAB http://www.mathworks.com/discovery/ matlab-gpu.html Learn More These languages are supported on all CUDA-capable GPUs. You might already have a CUDA-capable GPU in your laptop or desktop PC! CUDA C/C++ http://developer.nvidia.com/cuda-toolkit Thrust C++ Template Library http://developer.nvidia.com/thrust CUDA Fortran http://developer.nvidia.com/cuda-toolkit GPU.NET http://tidepowerd.com PyCUDA (Python) http://mathema.tician.de/software/pycuda Mathematica http://www.wolfram.com/mathematica/new -in-8/cuda-and-opencl-support/ © NVIDIA 2013
  • 69. Getting Started © NVIDIA 2013 • Download CUDA Toolkit & SDK: www.nvidia.com/getcuda • Nsight IDE (Eclipse or Visual Studio): www.nvidia.com/nsight • Programming Guide/Best Practices: • docs.nvidia.com • Questions: • NVIDIA Developer forums: devtalk.nvidia.com • Search or ask on: www.stackoverflow.com/tags/cuda • General: www.nvidia.com/cudazone
  • 70. © NVIDIA 2013 https://www.youtube.com/watch?v=IzU4AVcMFys Intro to CUDA - An introduction, how-to, to NVIDIA's GPU parallel Mythbusters Demo GPU versus CPU: http://youtu.be/-P28LKWTzrI
  • 72.
  • 73.
  • 74. Jetson TEGRA TK1 Tegra K1 SOC • Kepler GPU with 192 CUDA cores • 4-Plus-1 quad-core ARM Cortex A15 CPU • 2 GB x16 memory with 64 bit width • 16 GB 4.51 eMMC memory • 1 Half mini-PCIE slot • 1 Full size SD/MMC connector • 1 Full-size HDMI port • 1 USB 2.0 port, micro AB • 1 USB 3.0 port, A • 1 RS232 serial port • 1 ALC5639 Realtek Audio codec with Mic in and Line out • 1 RTL8111GS Realtek GigE LAN • 1 SATA data port • SPI 4MByte boot flash
  • 75. NVIDIA GTX 750Ti • Nvidia MAXWELL technology • Cost – 170 USD • Only 60 W of power, no dedicated power connections • 250 MHash/sek Vs • Nvidia GTX 780 – 350 MHash/sek + Power cosumption • Nvidia TESLA K40 – 560 MHash/sek + Power cosumption
  • 76. Latvian CUDA & parallel programming ecosystem Next meetups, frequency Speakers Topics Group marketing channels