PCIe Gen 3.0 Presentation @ 4th FPGA CampFPGA Central
PCIe Gen3 presentation by PLDA at 4th FPGA Camp in Santa Clara, CA. For more details visit http://www.fpgacentral.com/fpgacamp or http://www.fpgacentral.com
SHARP: In-Network Scalable Hierarchical Aggregation and Reduction Protocolinside-BigData.com
In this deck from the 2019 Stanford HPC Conference, Devendar Bureddy from Mellanox presents: SHARP: In-Network Scalable Hierarchical Aggregation and Reduction Protocol.
"Increased system size and a greater reliance on utilizing system parallelism to achieve computational needs, requires innovative system architectures to meet the simulation challenges. As a step towards a new network class of co-processors intelligent network devices, which manipulate data traversing the data-center network, SHARP technology designed to offload collective operation processing to the network.
This tutorial will provide an overview of SHARP technology, integration with MPI, SHARP software components and live example of running MPI collectives.
Devendar Bureddy is a Staff Engineer at Mellanox Technologies and has been instrumental in building several key technologies like SHARP, HCOLL, etc. Prior to joining Mellanox, he was a software developer at The Ohio State University in network-Based Computing Laboratory led by Dr. D. K. Panda, involved in the design and development of MVAPICH2, an open-source high-performance implementation of MPI over InfiniBand and 10GigE/iWARP.
Devendar received his master’s in Computer Science and Engineering from the Indian Institute of Technology, Kanpur. His research interests include high speed interconnects, parallel programming models and HPC software.
Watch the video: https://youtu.be/_EB2Ixy-cNw
Learn more: http://www.mellanox.com/page/products_dyn?product_family=261&mtag=sharp
and
http://hpcadvisorycouncil.com/events/2019/stanford-workshop/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
DPDK (Data Plane Development Kit) Overview by Rami Rosen
* Background and short history
* Advantages and disadvantages
- Very High speed networking acceleration in L2
- How this acceleration is achieved (hugepages, optimizations)
- rte_kni (and KCP)
- VPP (and FD.io project) , providing routing and switching.
- TLDK (Transport Layer Development Kit, TCP/UDP)
* Anatomy of a simple DPDK application.
* Development and governance model
* Testpmd: DPDK CLI tool
* DDP - Dynamic Device Profiles
Rami Rosen is a Linux Kernel expert, the author of "Linux Kernel Networking", Apress, 2014.
Rami had published two articles about DPDK in the last year:
"Network acceleration with DPDK"
https://lwn.net/Articles/725254/
"Userspace Networking with DPDK"
https://www.linuxjournal.com/content/userspace-networking-dpdk
Webinar: NVIDIA JETSON – A Inteligência Artificial na palma de sua mãoEmbarcados
Objetivo do Webinar: Venha saber como a plataforma NVIDIA Jetson e suas ferramentas habilitam você a desenvolver e implantar robôs, drones, aplicativos de IVA e outras máquinas autônomas com tecnologia AI que pensam por conta própria.
Apoio: Arrow e NVIDIA.
Convidado: Marcel Saraiva
Gerente de Contas Enterprise da NVIDIA, executivo com 20 anos de expereincia no mercado de TI, teve na sua carreia passagens pela SGI (Silicon Graphics), Intel e Scansource. Engenheiro eletrico formado pela FEI, com pós-graduação em Marketing pela FAAP e MBA em Gestão Empresarial pela FGV.
Link para o Webinar: https://www.embarcados.com.br/webinars/nvidia-jetson-a-inteligencia-artificial-na-palma-de-sua-mao/
PCIe Gen 3.0 Presentation @ 4th FPGA CampFPGA Central
PCIe Gen3 presentation by PLDA at 4th FPGA Camp in Santa Clara, CA. For more details visit http://www.fpgacentral.com/fpgacamp or http://www.fpgacentral.com
SHARP: In-Network Scalable Hierarchical Aggregation and Reduction Protocolinside-BigData.com
In this deck from the 2019 Stanford HPC Conference, Devendar Bureddy from Mellanox presents: SHARP: In-Network Scalable Hierarchical Aggregation and Reduction Protocol.
"Increased system size and a greater reliance on utilizing system parallelism to achieve computational needs, requires innovative system architectures to meet the simulation challenges. As a step towards a new network class of co-processors intelligent network devices, which manipulate data traversing the data-center network, SHARP technology designed to offload collective operation processing to the network.
This tutorial will provide an overview of SHARP technology, integration with MPI, SHARP software components and live example of running MPI collectives.
Devendar Bureddy is a Staff Engineer at Mellanox Technologies and has been instrumental in building several key technologies like SHARP, HCOLL, etc. Prior to joining Mellanox, he was a software developer at The Ohio State University in network-Based Computing Laboratory led by Dr. D. K. Panda, involved in the design and development of MVAPICH2, an open-source high-performance implementation of MPI over InfiniBand and 10GigE/iWARP.
Devendar received his master’s in Computer Science and Engineering from the Indian Institute of Technology, Kanpur. His research interests include high speed interconnects, parallel programming models and HPC software.
Watch the video: https://youtu.be/_EB2Ixy-cNw
Learn more: http://www.mellanox.com/page/products_dyn?product_family=261&mtag=sharp
and
http://hpcadvisorycouncil.com/events/2019/stanford-workshop/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
DPDK (Data Plane Development Kit) Overview by Rami Rosen
* Background and short history
* Advantages and disadvantages
- Very High speed networking acceleration in L2
- How this acceleration is achieved (hugepages, optimizations)
- rte_kni (and KCP)
- VPP (and FD.io project) , providing routing and switching.
- TLDK (Transport Layer Development Kit, TCP/UDP)
* Anatomy of a simple DPDK application.
* Development and governance model
* Testpmd: DPDK CLI tool
* DDP - Dynamic Device Profiles
Rami Rosen is a Linux Kernel expert, the author of "Linux Kernel Networking", Apress, 2014.
Rami had published two articles about DPDK in the last year:
"Network acceleration with DPDK"
https://lwn.net/Articles/725254/
"Userspace Networking with DPDK"
https://www.linuxjournal.com/content/userspace-networking-dpdk
Webinar: NVIDIA JETSON – A Inteligência Artificial na palma de sua mãoEmbarcados
Objetivo do Webinar: Venha saber como a plataforma NVIDIA Jetson e suas ferramentas habilitam você a desenvolver e implantar robôs, drones, aplicativos de IVA e outras máquinas autônomas com tecnologia AI que pensam por conta própria.
Apoio: Arrow e NVIDIA.
Convidado: Marcel Saraiva
Gerente de Contas Enterprise da NVIDIA, executivo com 20 anos de expereincia no mercado de TI, teve na sua carreia passagens pela SGI (Silicon Graphics), Intel e Scansource. Engenheiro eletrico formado pela FEI, com pós-graduação em Marketing pela FAAP e MBA em Gestão Empresarial pela FGV.
Link para o Webinar: https://www.embarcados.com.br/webinars/nvidia-jetson-a-inteligencia-artificial-na-palma-de-sua-mao/
This session will provide an overview of the new Qualcomm® Snapdragon™ Automotive Development Platform (ADP), which offers the multiple, integrated capabilities of optimized Qualcomm Technologies, Inc., production-grade solutions in a single-board platform. The ADP enables rapid development, testing and deployment of next-generation infotainment apps and experiences for the emerging connected car opportunity. Qualcomm Snapdragon is a product of Qualcomm Technologies, Inc.
Watch this presentation on YouTube:
https://www.youtube.com/watch?v=RMF3AQon3NU
This presentation, delivered by Aling Wu, AAEON & Sebastian Borchers, Wahtari, was the forth presentation of the Implementing AI: Vision Systems Webinar.
Harnessing the virtual realm for successful real world artificial intelligenceAlison B. Lowndes
Artificial Intelligence is impacting all areas of society, from healthcare and transportation to smart cities and energy. How NVIDIA invests both in internal pure research and accelerated computation to enable its diverse customer base, across gaming & extended reality, graphics, AI, robotics, simulation, high performance scientific computing, healthcare & more. You will be introduced to the GPU computing platform & shown real world successfully deployed applications as well as a glimpse into the current state of the art across academia, enterprise and startups.
Securing future connected vehicles and infrastructureAlan Tatourian
Slides from a keynote I gave at AZ Infragard. Since this was a keynote, I tried to dazzle the audience by talking more about technology and portraying security only as part of the underlying architecture of cognitive autonomous systems.
ENTER NVIDIA GRID
Delivering accelerated virtual desktops and applications.
This is where NVIDIA, the leader in graphics acceleration, stepped in to help. NVIDIA GRID technology allows IT to virtualize the physical GPU sitting in a server and share it with multiple VDI instances. This means that IT can deliver a true PC experience to any remote device from the datacenter. By providing a way to bring graphics acceleration to virtualization, NVIDIA GRID allows you to unlock all of the promises of productivity, mobility, security and flexibility for every one of your users.
With NVIDIA GRID you can safely house ALL your current into the datacenter so that they can be delivered out to any device, be it a thin client, chromebook, iPad or BYO Device. From an end user perspective this means they can be more productive working with the devices and in the locations that best suit them. For IT they are able to manage everything in a centrally in the datacenter which vastly simplifies their life.
by Mr. Tom Riley,
Director Global Business Development - Enterprise VR
For the full video of this presentation, please visit:
https://www.edge-ai-vision.com/2020/12/making-edge-ai-inference-programming-easier-and-flexible-a-presentation-from-texas-instruments/
For more information about edge AI and computer vision, please visit:
https://www.edge-ai-vision.com
Manisha Agrawal, Product Marketing Engineer at Texas Instruments, presents the “Making Edge AI Inference Programming Easier and Flexible” tutorial at the September 2020 Embedded Vision Summit.
Deploying an AI model at the edge doesn’t have to be challenging—but it often is. Embedded processing vendors have unique sets of software tools for deploying models. It takes time and investment to learn to use proprietary tools and to optimize the edge implementation to achieve your desired performance. While embedded vendors are providing proprietary tools for model deployment, the open source community is also advancing to standardize the model deployment process and make it hardware agnostic.
Texas Instruments has adopted open source software frameworks to make model deployment easier and more flexible. In this talk, you will learn about the struggles developers face when deploying models for inference on embedded processors and how TI addresses these critical software development challenges. You will also discover how TI enables faster time-to-market using a flexible open source development approach without the need to compromise performance, accuracy or power requirements.
This presentation, from Gregor Sievers, Ph.D., of dSPACE GmbH, addresses how the MIPI CSI-2℠, D-PHY℠, CCS, and A-PHY℠ specifications simplify validation and testing and help bring autonomous driving to the streets.
Cloud Native Night November 2017, Munich: Talk by Mario-Leander Reimer (@LeanderReimer, Principal Software Architect at QAware).
Join our Meetup: www.meetup.com/cloud-native-muc
Abstract: Until today existing enterprise applications are integrated, tested, and deployed as monoliths. This is very time-consuming and hinders agile business models. Cloud technology promises unlimited scalability, short release cycles, quick deployments and antifragility. But can we evolve these systems into the cloud with reasonable effort? What do we have to change and what are the risks involved? This talk will share the experiences from a real world customer project and present an industrialized approach for the Cloud-native evolution of existing IT landscapes.
OpenNebulaConf 2019 - Crytek: A Video gaming Edge Implementation "on the shou...Dmytro Korzhevin
Presentation covers various cyber security aspects that are stands behind the AAA-Level game projects. And what is most important it covers a practically proven way to provision own data (game services) in 22 geographical locations in 22 minutes, using opensource solution - OpenNebula and it's DDC features. During this 22 minutes you receive fully distributed mesh infrastructure, located in 22 different geo locations (datacenters) provisioned using only bare metal hardware servers, with preconfigured GNU/Linux OS and preconfigured VM on top of each server. Each server has own control server in own region with backconect to 'mother' server in central location with High Availability configured, own network segments in each datacenter, elastic IP's, Backend Transfer Facilities, Local BGP.
The IoT is becoming extremely popular keyword in the industries while there are many different interpretations or various definitions. However, one common requirement is that it requires many Sensor devices connected to Linux devices. The user space drivers for GPIO, I2C/SPI and UART sensors in the past were implemented separately from scratch delicately for each product. This will cause significant challenge of software engineering overhead while GPIO, I2C/SPI and UART sensors are dramatically increasing which have to be supported. The IoTDK is one of the library to provide portability of sensors' driver to solve the situation.
The talk will includes guide of IoTDK and 96Boards and tutorial of programing I2C and GPIO devices. Targeted audiences are who are interested in IoT sensors or who would like to move from Arduino and Raspberry Pi to modern ARM CPU effectively.
This presentation was delivered at LinuxCon Japan 2016 by Akira Tsukamoto.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2023/07/a-new-open-standards-based-open-source-programming-model-for-all-accelerators-a-presentation-from-codeplay-software/
Charles Macfarlane, Chief Business Officer at Codeplay Software, presents the “New, Open-standards-based, Open-source Programming Model for All Accelerators” tutorial at the May 2023 Embedded Vision Summit.
As demand for AI grows, developers are attempting to squeeze more and more performance from accelerators. Ideally, developers would choose the accelerators best suited to their applications. Unfortunately, today many developers are locked into limited hardware choices because they use proprietary programming models like NVIDIA’s CUDA. The oneAPI project was launched to create an open specification and open-source software that enables developers to write software using standard C++ code and deploy to GPUs from multiple vendors.
OneAPI is an open-source ecosystem based on the Khronos open-standard SYCL with libraries for enabling AI and HPC applications. OneAPI-enabled software is currently deployed on numerous supercomputers, with plans to extend into other market segments. OneAPI is evolving rapidly and the whole community of hardware and software developers is invited to contribute. In this presentation, Macfarlane introduces how oneAPI enables developers to write multi-target software and highlights opportunities for developers to contribute to making oneAPI available for all accelerators.
The Metaverse and AI: how can decision-makers harness the Metaverse for their...Jen Stirrup
The Metaverse is popularized in science fiction, and now it is becoming closer to being a part of our daily lives through the use of social media and shopping companies. How can businesses survive in a world where Artificial Intelligence is becoming the present as well as the future of technology, and how does the Metaverse fit into business strategy when futurist ideas are developing into reality at accelerated rates? How do we do this when our data isn't up to scratch? How can we move towards success with our data so we are set up for the Metaverse when it arrives?
How can you help your company evolve, adapt, and succeed using Artificial Intelligence and the Metaverse to stay ahead of the competition? What are the potential issues, complications, and benefits that these technologies could bring to us and our organizations? In this session, Jen Stirrup will explain how to start thinking about these technologies as an organisation.
Enhancing Performance with Globus and the Science DMZGlobus
ESnet has led the way in helping national facilities—and many other institutions in the research community—configure Science DMZs and troubleshoot network issues to maximize data transfer performance. In this talk we will present a summary of approaches and tips for getting the most out of your network infrastructure using Globus Connect Server.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionAggregage
Join Maher Hanafi, VP of Engineering at Betterworks, in this new session where he'll share a practical framework to transform Gen AI prototypes into impactful products! He'll delve into the complexities of data collection and management, model selection and optimization, and ensuring security, scalability, and responsible use.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfPeter Spielvogel
Building better applications for business users with SAP Fiori.
• What is SAP Fiori and why it matters to you
• How a better user experience drives measurable business benefits
• How to get started with SAP Fiori today
• How SAP Fiori elements accelerates application development
• How SAP Build Code includes SAP Fiori tools and other generative artificial intelligence capabilities
• How SAP Fiori paves the way for using AI in SAP apps
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
2. 目次
GTC2018 での NVIDIA 自動運転関係・技術発表及び関連アナウンスメント (補
足スライド入り)
1. S8666: Deploying Autonomous Vehicles with NVIDIA DRIVE
2. S8531: Deep Learning Infrastructure for Autonomous Vehicles
3. S8294: NVIDIA DRIVE Safety: NVIDIA's Strategies for Enabling Safety in Automotive
Platforms
4. S8324: Synthetic Data Generation for an All-in-One Driver Monitoring System
5. ANNOUNCEMENT: NVIDIA and ARM Partner to Bring Deep Learning to Billions of IoT
Devices
P2
4. ANNOUNCING “PEGASUS”
ROBOTAXI DRIVE PX
レベル5 完全自動運転
▪ Xavier (Volta GPU integrated) x 2
▪ Next generation discrete-GPU x 2
▪ 320 TOPS CUDA TensorCore
▪ ASIL D Certification
▪ Combined Memory Bandwidth: >1TBytes/sec
▪ Automotive I/Os
▪ 16x GMSL High-speed Camera Inputs
▪ Multiple 10Gbit Ethernet
▪ CAN, Flexray
▪ Late Q1 Early Access Partners
▪ Supercomputing Data Center in your Trunk
補足スライド
5. DRIVE DEVELOPMENT PLATFORM
Development Platform for DRIVE Xavier & DRIVE Pegasus
– building on the capabilities of DRIVE PX 2
Auto-grade ASIL-D Safety MCU
Upto 16x CAN
2x Flexray
Ethernet:
4x 10Gbps
7x 1Gbps or 100Mbps
5x 100Mbps
2x Xavier SOC:
CV & DL Accelerator
CUDA Processing
137GB/s LPDDR4x
2x discrete GPU:
Next Gen CUDA GPU
Tensor Core
Support for IST
384GB/s GDDR6
Raw Sensor Input
16x GMSL
91Gbps
XAVIER
Next
Generation
GPU
Next
Generation
GPU
DeSer
DeSer
DeSer
XAVIER
DeSer
MCU
PCIE
Switch
NVLINK
NVLINK
ENET
2x Xavier Developer I/O
HDMI
4 x USB
UART (via USB)
JTAG
Only with
Pegasus
DRIVE PX2 | DRIVE Xavier | DRIVE Pegasus | One Architecture
6. DRIVE SOFTWARE
NVIDIA Deliverables
DRIVE Development Platform
Sensors
Sensors
& Maps
NVMEDIA DRIVE OS, CUDA
CUDA accelerated libraries
(notably TensorRT)
DriveWorks Algorithm Modules
Autonomous Driving Applications
DriveWorks
Tools DNNs
Included Not included
DRIVE OS (Linux or QNX)
NVMEDIA
SAL
Sensor Abstraction
Layer
CUDA libraries
(CuDNN…)
DriveWorks Algorithm Modules
TensorRT NvMedia
(VPI)
Open GL
ES
VPI: Vision Primitives / Programmable Interface
(OpenVX, VisionWorks supported by CUDA)
7. NvMedia
(Camera,
ISP,
Encoder
TensorRT
NvMedia
(VPI)
CUDA
OpenGL
Xavier CPU
Video Encoder
0.9 - 1.4 Gpix/s
Video Decoder
– Gpix/s
Deep Learning
Accelerator (DLA)
5 FP16 / 10 INT8 TOPs DL
ISP
1.5 Gpix/s
VIC
2 Gpix/s
PVA
0.5 - 1.3 TOPS
Stereo & OF
(SOFE)
6 TOPS
X 16
Camera
• Raw data for DNN
training
• Detection,
classification,
segmentation
• Computer vision,
autonomous vehicle
algorithms
• Archiving, computer
data for DNN,
automotive simulation
• Visualization, in-
vehicle display
GPU Compute
1.3FP32 TFLOPS CUDA
20INT8 TOPs DL
GPU Graphics
1.3 TFLOPS FP32
Human
Interface
Compressed
Storage
Math
Computation
DNN
Inference
Raw
Data Storage
Computer
Vision
• Autonomous vehicle
algorithms
Lidar
Radar
...
XAVIER SOC ENGINES
& DRIVE OS
Xavier HW Engines DRIVE OS
SW APIs
Inputs
Data
Destination
Purpose
ISP: Image Signal Processor
VIC: Video Imaging Controller
PVA: Programmable Vision Accelerator
OF: Optical Flow
SOFE: Stereo Optical Flow Engine
8. DRIVEWORKS
Modules (with C APIs):
Sensor, Vehicle I/O abstraction
Automotive image processing modules
• Stereo/Rectification, Color Correction…
Automotive Computer Vision modules
• Point Cloud Processing, SFM(Structure
from Motion), 2D Tracker…
Tools
Recording, Replaying, Visualization.
Abstracting the vehicle
9. DRIVE DEVZONE
SW Installer & Dev. Tools
Latest SW
Documentation
Links to Support
Updates on Ecosystem
https://developer.nvidia.com/DRIVE
10. REVISITING THE FLOW…
to put an Autonomous Vehicle on the road?
Curated
Annotated
Training Data
Data Acquired
From Sensors
Trained
Deep Neural
Network
Autonomous Vehicle
Applications
Autonomous Vehicle
Application Development
Test/Drive
Simulation
& Re-Simulation
HD Map
Neural
Network
Training
1
2
3
Data Acquisition to train DNN and create HD Map
Autonomous Vehicle Application Development
Testing In-Vehicle or With Simulation
1
2
3
DRIVE PX
DRIVE PX
(HIL)
DRIVE PX
DGX (SIL)
13. 2018 2019
DRIVE AI COMPUTER
Development to Production | One Architecture
DRIVE™ Xavier
30 TOPS | 1x Xavier
DRIVE™ Pegasus
320 TOPS | 2x Xavier + 2x Next Gen GPU
DRIVE PX 2
24 TOPS | 2x Parker + 2x Pascal dGPU
DRIVE™ Development Platform - Xavier
60 TOPS | 2x Xavier
DRIVE™ Development Platform - Pegasus
320 TOPS | 2x Xavier + 2x Next Gen GPU
DEVELOPMENT PRODUCTION
14. SIMULATION —
THE PATH TO BILLIONS OF MILES
World drives trillions of miles each year.
U.S. has 770 accidents per billion miles.
A fleet of 20 test cars cover 1 million miles
per year.
15. ANNOUNCING
NVIDIA DRIVE SIM
AND CONSTELLATION
AV VALIDATION SYSTEM
▪ Virtual Reality AV Simulator
▪ Same Architecture as DRIVE Computer
▪ Simulate Rare and Difficult Conditions
▪ Recreate Scenarios
▪ Run Regression Tests
▪ Drive Billions of Virtual Miles
10,000 Constellations Drive 3B Miles per Year
DRIVE SIM
Generating
Virtual Reality
3D Image
Stimulus
Constallation
Running
Autonomous Driving
Application on real
HW
Interfaces
HW Platform
3D Virtual Images
Car Responses
Example: 4 Camera based Autonomous Driving Simulation
HIL (Hardware In the Loop) Simulation
20. NVIDIA DRIVE FUNCTIONAL SAFETY ARCHITECTURE
HD Map
DRIVE XAVIER DDPX DGX (DDPX Emulation)
AutoSIM
(Photo Realistic 3D-CG by DGX
for Corner Cases etc)
DRIVE OS
▪ Diverse Engines
▪ Dual Execution
▪ ECC/Parity
▪ Diagnosis, BIST
Hypervisor
CUDA, TensorRT
ISO 26262
DRIVE AV
System Operates Safely Even when Faults Detected
Holistic System — Process & Methods, Processor Design, Software, Algorithms, System Design, Validation
ISO 26262 ASIL-D Safety Level | Partnership with BlackBerry QNX and TTTech | New AutoSIM Virtual Reality 3D Simulator
Same GPU
Architecture
Re-SIM
(Captured Real Data)
Traffic Environment Input(Stimulus)
and Testing/VerificationHIL SIL
HIL: Hardware (DDPX) In the Loop
SIL: Software(DGX) In the LoopISO26262
ASIL-D
21. DRIVE SOFTWARE
NVIDIA Deliverables
DRIVE Development Platform
Sensors
Sensors
& Maps
NVMEDIA DRIVE OS, CUDA
CUDA accelerated libraries
(notably TensorRT)
DriveWorks Algorithm Modules
Autonomous Driving Applications
DriveWorks
Tools DNNs
Included Not included
DRIVE OS (Linux or QNX)
NVMEDIA
SAL
Sensor Abstraction
Layer
CUDA libraries
(CuDNN…)
DriveWorks Algorithm Modules
TensorRT NvMedia
(VPI)
Open GL
ES
VPI: Vision Primitives / Programmable Interface
(OpenVX, VisionWorks supported by CUDA)
補足スライド
22. Critical Components are ASIL-D
QNX RTOS, Classic AUTOSAR, & Hypervisor
To be Safe, must be Secure
Secure boot, Security Services, Firewall, & OTA
To be Safe, must be Real-time
QNX RTOS for mission application
Hypervisor for Quality of Service (QoS)
DRIVE OS
Safe, Secure, & Real-time
23. NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
WHY QNX
FOR
SAFETY?
Safety OS Key Selection Criteria:
ISO 26262 ASIL D Certified RTOS
ISO 26262 qualified tool chain (up to TCL 3)
POSIX PSE52 standards certification
- Requirement for CUDA, cuDNN support
Common Unix heritage with Linux
- Rich dependent library support
TCL3: Tool Confidence Level
POSIX PSE52: support application portability at source code level
25. NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
SOFTWARE SAFETY FRAMEWORK
Three Level Safety Supervision (3LSS) architecture
Coherent with Xavier SOC safety architecture and DRIVE platform
• Support for Xavier SOC HW Safety Manager
• Integrated into DRIVE OS to be available for DRIVE platform
• Anchored by ASIL D capable Safety-MCU on DRIVE platform
Provides standard mechanism for handling of potentially safety critical errors and
performing diagnostics
Supporting freedom from interference in execution, memory and information
exchange domains
Overview
26. NVIDIA SAFETY FRAMEWORK
THREE LEVEL SAFETY SUPERVISION (3LSS)
L1SS
One Partition on Hypervisor
CCPLEX(CPU Complex) = Carmel CPU x 8
L2SS
Safety OS & AutoSAR
SCE(Safety Control Engine) = R5 CPU x 2 in Lock-step)
L3SS
Safety OS & AutoSAR
External ASIL-D MCU
SHM (Safety Hardware Manager)
Safety Supervision Serial Channel on SPI I/F Error Signaling Pin
Heartbeat Monitoring on SPI I/F
Safety PMIC (Power Management IC)
XAVIER
▪ Diverse Engines
▪ Dual Execution
▪ ECC/Parity
▪ Diagnosis
▪ BIST (Built In Self
Test)
27. SOFTWARE SAFETY FRAMEWORK
Flexible and extendable to handle the increasing safety requirements
Configurable and portable to new DRIVE versions
Off the shelf safety solution, optimizing/reducing effort on application side.
Enables easy deployment of safety applications on DRIVE platform
• Provides safety services such as flow monitoring
Supports the fault tolerant foundation of our Drive platform
Outcome
29. NVIDIA DRIVE IX SDK
IX (Intelligent eXperience) Toolkit
Sense Inside & Outside the Vehicle | Deep Learning Powered | Early Access Q4
Your Car is an AI
Customer Application
DRIVE OS
DRIVE AV
Object, Path, Wait Perception
DRIVE IX
Gaze, Head Pose, Gestures,
Recognize Face,
Voice Recognition & Lip Reading
Exterior Driver
Recognition
Automatic
Personalization
Inattentive Driver
Alert
Cyclist
Alert
Distracted Driver
Alert
Driver/Passenger
Recognition
Multiple In-Car Sensors
補足スライド
31. NEED FOR SYNTHETIC DATA
No devices available – e.g. face landmarks at extreme angles
Manual labelling – limited by human precision and error
Manpower and time limits – recording in different environments
Sensor interference with scene – glasses for eye tracking, optical markers
Needs high resolution devices – head poses, gaze
Lacks associativity and completeness – multiple recordings for multiple parameters
Where does real world data fall short?
32. NEED FOR SYNTHETIC DATA
More flexible - environment parameters, camera distance, background
Error free - Free from manual labelling errors / Human errors
Accurate - Synthetic data readings are highly accurate. No sensor noise
High resolution as well as low resolution images can be generated
Allows labelling of occluded areas
3D as well as 2D labels can be accurately generated
Fast and economical
Where does synthetic data offer advantages?
34. DATA GENERATION PIPELINE
Steps involved in synthetic data generation:
3D Head Scan with High Resolution
Retopology
Defining Mesh Deformation
Annotation
35. DATA GENERATION PIPELINE
High Resolution 3D Head Scan
• Capture accurate 3D face details using depth sensor
or multiple synchronized cameras using triangulation
• High density mesh (~0.1 mm resolution)
• Why can’t we use such high resolution scan?
• Needs lots of computation power to transform each
mesh vertex
• More manual efforts for defining deformation and
key shapes
http://ten24.info/10-x-high-resolution-head-scans-avaliable-to-download/
[2]
36. DATA GENERATION PIPELINE
Retopology
Reduce mesh vertices count to save on
computation cost
• Reduced vertex count (~1000 x reduction)
• Add displacement map to preserve details
• Easier to define shape keys manually on reduced mesh
37. DATA GENERATION PIPELINE
Mesh Deformation
• Define shape keys and set vertices manually
to define a face feature (e.g. eyes looking
down, smile, eyebrow raised)
• Intermediate values are interpolated
automatically
• Allows programmatic control of parameters
[1]
38. DATA GENERATION PIPELINE
Landmark annotation
• Manually mark points of interest
• Allows tracking and automatic labelling of face landmarks
• 2D and 3D point coordinates are available as the environment is synthetic
• Occluded points could be accepted/rejected programmatically
[1]
39. DATA GENERATION PIPELINE
Example annotated image
Following features are saved along with image in a file
• Face bounding box
• Face landmarks
• Head pose
• Gaze
• Eye lid, pupil(eyeball), iris markers
• Face ID
41. PARAMETER SETTING
Head Pose control
• Position camera to cover head
• Camera oriented towards center of both eyes
• Head orientations can be set to captured values
to mimic human head motions
42. PARAMETER SETTING
Gaze control
• Position subject with respect to camera
• Set target gaze location
• Rotate eyes to look at target location
• Render cases which honor anatomic constraints
• Unobstructed ( Pupil is partly exposed > 35% )
• Within anatomically allowed pitch/yaw range
• Visible to the camera
Constraints
45. POST PROCESSING
Domain adaption is used to operate on synthetic data to closely resemble real
data. Following techniques can be used to adopt to real world domain:
Gaussian filtering/Blurring – This effect imitates focal blur of the camera
Noise addition – Imitates sensor noise, environment noise/dust
Brightness/contrast correction – To simulate different lighting conditions
Scaling – To make up for face distance, bounding box tightness
Mirroring – Most of the use cases are agnostic to mirroring; others can use
transformed parameters
48. DEPLOYMENT
Execution SEC/FRAME 150K FRAMES COMMENTS
CPU 40 70 days Consumes entire CPU
GPU 8 14 days 30-60% GPU(~5 sec/frame)
GCF (10 GPUs) 0.8 1.4 days
Training a DNN requires images in the order of ~150K and higher
Speed up of 5x with a GPU and (10x) times using GCF (GPU Compute Farm)
49. 目次
GTC2018でのNVIDIA自動運転関係・技術発表及び関連アナウンスメント (補
足スライド入り)
1. S8666: Deploying Autonomous Vehicles with NVIDIA DRIVE
2. S8531: Deep Learning Infrastructure for Autonomous Vehicles
3. S8294: NVIDIA DRIVE Safety: NVIDIA's Strategies for Enabling Safety in Automotive
Platforms
4. S8324: Synthetic Data Generation for an All-in-One Driver Monitoring System
5. ANNOUNCEMENT: NVIDIA and ARM Partner to Bring Deep Learning to Billions of IoT
Devices
50. NVIDIA AND ARM PARTNER TO BRING DEEP LEARNING
TO BILLIONS OF IOT DEVICES
NVIDIA Deep Learning Accelerator IP to be Integrated into ARM Project Trillium
Platform, Easing Building of Deep Learning IoT Chips
GPU Technology Conference — NVIDIA and Arm today announced that they are partnering to bring deep
learning inferencing to the billions of mobile, consumer electronics and Internet of Things devices that
will enter the global marketplace.
Under this partnership, NVIDIA and Arm will integrate the open-source NVIDIA Deep Learning
Accelerator (NVDLA) architecture into Arm’s Project Trillium platform for machine learning. The
collaboration will make it simple for IoT chip companies to integrate AI into their designs and help put
intelligent, affordable products into the hands of billions of consumers worldwide.
Tuesday, March 27, 2018
51. ANNOUNCING DRIVE XAVIER SAMPLING IN Q1
Most Complex SOC Ever Made | 9 Billion Transistors, 350mm2, 12nFFN | ~8,000 Engineering Years
Diversity of Engines Accelerate Entire AV Pipeline | Designed for ASIL-D AV
Volta GPU
FP32 / FP16 / INT8 Multi
Precision
512 CUDA Cores
1.3 CUDA TFLOPS
20 Tensor Core TOPS
ISP
1.5 GPIX/s
Native Full-range HDR
Tile-based Processing
PVA
1.6 TOPS
Stereo Disparity
Optical Flow
Image Processing
Video Processor
1.2 GPIX/s Encode
1.8 GPIX/s Decode
16 CSI
109 Gbps
1Gbps E & 10Gbps Eithernet
256-Bit LPDDR4
137 GB/s
DLA
5 TFLOPS FP16
10 TOPS INT8
Carmel ARM64 CPU
8 Cores
10-wide Superscalar
2700 SpecInt2000
Functional Safety
Features
Dual Execution Mode
Parity & ECC
World’s First
Autonomous
Machine
Processor
52. NVDLA (NVIDIA DEEP LEARNING ACCELERATOR)
XAVIER SOCにも内蔵
更なる電力効率向上
Command Interface
Tensor Execution Micro-controller
Memory Interface
Input DMA
(Activations and
Weights)
Unified
512KB
Input
Buffer
Activations
and
Weights
Sparse Weight
Decompre-
ssion
Native
Winograd
Input
Transform
MAC
Array
2048 Int8
or
1024 Int16
or
1024 FP16
Output
Accumu-
lators
Output Post
processor
(Activation
Function,
Pooling etc.)
Output
DMA
係数が疎になる性質を利用して
メモリバンド幅削減
極力乗算を減らす最新アルゴリズムで
チップサイズ、消費電力低減
期待される数々の最新技術例
他機能は理にかなった
合理的なアクセラレーション
Reference NVDLA: http://nvdla.org
53. なぜ AI 推論で演算精度・乗算回数を低減する必要があるか?
Artem Vasilyev, “CNN Optimization for Embedded Systems and FFT”
CS231n: CNN for Visual Recognition Course, Stanford, 2017
INT16では乗算は加算の
13倍の電力を消費
FPFF16では乗算は加算の
2.25倍の電力を消費
FP32では乗算は加算の
4.7倍の電力を消費
可能な限り演算精度を
下げる必要があることは
言うまでもない。
乗算回数低減はcuDNNライブラリや
DLA (Deep Learning
Accelerator) アーキテクチャで実施
154. 自動運転全般、アルゴリズム関連
• S81014 - Advancing State-of-the-Art of Autonomous Vehicles and Robotics
Research using AWS GPU Instances (Presented by Amazon Web Services)
• TRIによるAWSの活用事例、開発環境の紹介
• S8140 - Deep Learning for Automated Systems: From the Warehouse to the Road
• Clemson Univ. 自動運転ソフトウェア開発フロー
• S8862 - Autonomous Algorithms
• Panelセッション。NNAISENSE社によるAUDI自動駐車実験の紹介や、fka社による経路生成におけ
るDNNの活用など
155. マップ、ローカライゼーション関連
• S8834 - In-Vehicle Change Detection, Closing the Loop in the Car
• HERE – セルフヒーリングマップ、フリート用センサーの紹介
• S8861 - Crowd-sourcing, Map updates, and Predictions as Complementary
Solutions for Mapping
• Panel (explorer.ai, VoxelMaps, DeepMap) – クラウドソーシングによるマップ、
Voxelベースの地図、オープンプロジェクト等
• S8618 - GPU Accelerated LIDAR Based Localization for Automated Driving
Applications
• FORD – GPUによるローカライゼーションの高速化
156. • S8758 - The Future of the In-Car Experience
• Affectiva社による感情検出AIの紹介、学習データセットの事例や転移学習の成果について
• S8970 - Creating AI-Based Digital Companion for Mercedes-Benz Vehicles
• Mercedes Benzでの車載AIの実装
インキャビン関連