15. Riga BioTechnology Meetup
UI/UX Riga Meetup
Riga Drone Meetup
Riga Mobile App Developer
Meetup
Kick-off Meetup
December 12, THE Mill
December Meetup
December 15, THE Mill
Drone Kick-off Meetup
December 16, THE Mill
Kick-off Meetup
December 17, THE Mill
16. Riga Startup: Idea to IPO
CUDA parallel programming and
gaming meetup Riga
3D Printing Riga
Meetup
Bitcoin and Cryptocurrencies
Meetup
Find you Co-founder
December 18, THE Mill
December CUDA Meetup
December 22, THE Mill
January 3D Printing
January 8, RTU Design
Factory
JanuARY Meetup
jANUARy 26, THE Mill
28. Install the NVIDIA Linux driver binary release on your target located in:
${HOME}/NVIDIA-INSTALLER
Step 1)
Change directories into the NVIDIA installation directory:
cd ${HOME}/NVIDIA-INSTALLER
Step 2)
Run the installer script to extract and install the Linux driver binary release:
sudo ./installer.sh
Step 3)
Reboot the system to have the graphical desktop UI come up.
37. Jetson TEGRA TK1
Tegra K1 SOC
• Kepler GPU with 192 CUDA cores
• 4-Plus-1 quad-core ARM Cortex A15 CPU
• 2 GB x16 memory with 64 bit width
• 16 GB 4.51 eMMC memory
• 1 Half mini-PCIE slot
• 1 Full size SD/MMC connector
• 1 Full-size HDMI port
• 1 USB 2.0 port, micro AB
• 1 USB 3.0 port, A
• 1 RS232 serial port
• 1 ALC5639 Realtek Audio codec with Mic in and Line out
• 1 RTL8111GS Realtek GigE LAN
• 1 SATA data port
• SPI 4MByte boot flash
39. • IT industryexperiencesanincreasinggrowth for displaysurfaceswith high resolution
• Usecasesfor suchsurfacesincludesatellite and map data,x-rayand microscopeimages,multimedia,CCTV,etc.
• Existing solutions arenot scalable, do notoffer hardwareabstraction,suffer fromwiring limitations
Proposed Virtual Machine Based Monitor Wall Architecture
Introduction Scalability
Conclusions
• The current experiments show that this architecture is very feasible
for non FPS intensiveusecases where the displaywall can bedriven
byasingle physicalGPU
• The total resolution provided by this architecture even using the
currently available compression technology greatly exceeds the
resolutions of existing solutions, it would be expected for the
resolutionto grow inthe future
• The architecture itself scales very good, it is limited mainly by OS
support for multiple monitors (this can be overcome by simulating
a single high resolution display in the virtual machine that spans the
whole resolution of the physical wall) and the possibility to stack
multiple GPU’sin thehostsystem
• Future work should focus on the ability to virtualize OpenGL and
Direct3Dto removethe advantages ofnon-virtualized architectures
OS GPU
GPU
Monit
or
Monit
or
Monit
or
Monit
or
OS GPU
Monit
or
Monit
or
Monit
or
Monit
or
Splitter /
Scaler
Currentlythere aretwo mainalternatives to beused asthe H.264encoderin this architecture –Intel Quick Syncand NVENC.NVCENCis morefeasiblebecause:
• Thetotal encoding powercanbeincreasedbystacking up multiple GPUsthat supportNVENCwithout penalties while notall Intel QuickSyncGPUshavebuilt in video memory so scaling thesecardsintroduceaperformancepenalty
of using systemmemory
• NVENCdoes notput anylimitations onother componentsofthe system,while Intel Quick Syncsupportsalimited amountofCPUs
• Currentbenchmarksseemto showthat the overallFPS performancefor asingle GPU (whichisthe main criteriafor this architecture)is better for NVENCthan Intel QuickSync
Why NVENC?
Pro:OScannatively managethe displays
Con: Powerconsumption,supportedmonitorcountlimited bythe output countof theGPUsand
expansionslots for theGPUsonthe motherboard,deploymentislimited bywiring
Pro:Softwarecomplexityisreducedsinceit doesnot haveto bemultiple monitor aware
Con: SmallresolutionandDPI, visualization is notdisplayed in it’s nativeresolution
Con: Expensive
Currently Popular Monitor Wall Architectures
Pro:Scalable,hostmachinecanrunmultiplevirtual machines,multiple
virtualized GPU’s mapto physicalGPU’s to maximizeefficiency
Pro:LANconnectionto thedisplaywall removeswire length
limitations forcedbyDVI/HDMI cables
Pro:Total resolutionof thewall goes beyondtheones that canbe
achievedusing physical hardware
Con: Lossycompression
Con: NoDirect3D,OpenGLsupport
• Thehostmachinecollects the framebuffer datafrom thevirtual machineGPUsand performsH.264 encodingof thevideo stream onthe physicalhost
GPUthus thearchitectureheavilyreliesonafast hardwareH.264encoderallowing thehosted virtual machinesto fully usetheCPU
• NonFPS intensiveusecasesallow agreatnumberof virtual monitors to behosted onasingle physicalGPU thus reducingthe power consumption
0
50
100
Using Video…
Maximum
Thegraphbelowdemonstratesthescalability possibilities in termsof
possiblemaximalamountof connectedmonitors for thetraditional
architectureversustheproposedoneonaQuadroK4000 cardthat has 4
outputs.
Displaywallarchitecturewhereeachoutput of theGPU mapsto atileon thedisplaywall Displaywallarchitecturewhereeachoutput of theGPU issplit/upscaled among the tileson thedisplaywall
Host Machine
Virtual MachineG
P
U
G
P
U
G
P
U
G
P
UG
P
U
G
P
U
G
P
U
G
P
UG
P
U
G
P
U
G
P
U
G
P
UG
P
U
G
P
U
G
P
U
G
P
U
H.264/RTP/LAN
GPU
Theproposed displaywallarchitecturewhereeachoutput of avirtualGPU mapsto a
tileonthedisplaywallandistransmittedasaH.264streamoverLAN
Virtualmachinebasedmonitor wallrunningGooglemapsinsideChromewebbrowser on16tilesat1920x1080pixelseachgivingatotal
resolutionof 32megapixels
Eachtilehasadedicated LANconnectionandH.264decoder
Scalabilityof supported monitor count
NVENC BasedH.264 Encoding forVirtual
Machine BasedMonitorWall Architecture
R.Bundulis(rudolfs.bundulis@lu.lv),G.Arnicans (guntis.arnicans@lu.lv), and R.Gailums (rihards.gailums@rhtu.edu.lv)
UniversityofLatvia/RigaHighTechUniversity,Latvia
40. For Startups by Meetup members:
$1800 per year of FREE Azure cloud services
Free Microsoft software and tools
67. // generate 32M random numbers on host
thrust::host_vector<int> h_vec(32 << 20);
thrust::generate(h_vec.begin(),
h_vec.end(),
rand);
// transfer data to device (GPU)
thrust::device_vector<int> d_vec = h_vec;
// sort data on device
thrust::sort(d_vec.begin(), d_vec.end());
// transfer data back to host
thrust::copy(d_vec.begin(),
d_vec.end(),
h_vec.begin());
Rapid Parallel C++ Development
• Resembles C++ STL
• High-level interface
• Enhances developer
productivity
• Enables performance
portability between GPUs and
multicore CPUs
• Flexible
• CUDA, OpenMP, and TBB
backends
• Extensible and customizable
• Integrates with existing
software
• Open source
http://developer.nvidia.com/thrust or http://thrust.googlecode.com
74. Jetson TEGRA TK1
Tegra K1 SOC
• Kepler GPU with 192 CUDA cores
• 4-Plus-1 quad-core ARM Cortex A15 CPU
• 2 GB x16 memory with 64 bit width
• 16 GB 4.51 eMMC memory
• 1 Half mini-PCIE slot
• 1 Full size SD/MMC connector
• 1 Full-size HDMI port
• 1 USB 2.0 port, micro AB
• 1 USB 3.0 port, A
• 1 RS232 serial port
• 1 ALC5639 Realtek Audio codec with Mic in and Line out
• 1 RTL8111GS Realtek GigE LAN
• 1 SATA data port
• SPI 4MByte boot flash
75. NVIDIA GTX 750Ti
• Nvidia MAXWELL technology
• Cost – 170 USD
• Only 60 W of power, no dedicated power connections
• 250 MHash/sek
Vs
• Nvidia GTX 780 – 350 MHash/sek + Power cosumption
• Nvidia TESLA K40 – 560 MHash/sek + Power cosumption
76. Latvian CUDA & parallel programming
ecosystem
Next meetups, frequency
Speakers
Topics
Group marketing channels