3D perception is crucial for understanding the real world. It offers many benefits and new capabilities over 2D across diverse applications, from XR and autonomous driving to IOT, camera, and mobile. 3D perception with machine learning is creating the new state of the art (SOTA) in areas, such as depth estimation, object detection, and neural scene representation. Making these SOTA neural networks feasible for real-world deployment on mobile devices constrained by power, thermal, and performance has been a challenge. Qualcomm AI Research has developed not only novel AI techniques for 3D perception but also full-stack AI optimizations to enable real-world deployments and energy-efficient solutions. This presentation explores the latest research that is enabling efficient 3D perception while maintaining neural network model accuracy. You’ll learn about:
- The advantages of 3D perception over 2D and the need for 3D perception across applications
- Advancements in 3D perception research by Qualcomm AI Research
- Our future 3D perception research directions
5G + AI: The Ingredients For Next Generation Wireless InnovationQualcomm Research
5G and AI are two of the most disruptive technologies the world has seen in decades. While each is individually revolutionizing industries and enabling new experiences, the combination of both 5G and AI is going to be truly transformative. Applying AI not only to the 5G network but also the device will lead to more efficient wireless communications, longer battery life and enhanced user experiences. The low latency and high capacity of 5G will also allow AI processing to be distributed amongst the device, edge cloud and central cloud, enabling flexible system solutions for a variety of use cases. At Qualcomm Technologies, we are not only working on cutting-edge research for 5G and AI, but we are also exploring their synergies to realize our vision of the future. View this presentation to learn how AI is making 5G better -- in the network and on the device, why on-device AI processing is essential, and how 5G is empowering distributed learning over wireless.
Qualcomm Webinar: Solving Unsolvable Combinatorial Problems with AIQualcomm Research
How do you find the best solution when faced with many choices? Combinatorial optimization is a field of mathematics that seeks to find the most optimal solutions for complex problems involving multiple variables. There are numerous business verticals that can benefit from combinatorial optimization, whether transport, supply chain, or the mobile industry.
More recently, we’ve seen gains from AI for combinatorial optimization, leading to scalability of the method, as well as significant reductions in cost. This method replaces the manual tuning of traditional heuristic approaches with an AI agent that provides a fast metric estimation.
In this presentation you will find out:
Why AI is crucial in combinatorial optimization
How it can be applied to two use cases: improving chip design and hardware-specific compilers
The state-of-the-art results achieved by Qualcomm AI Research
This presentation outlines the synergistic nature of 5G and AI -- two disruptive areas of innovations that can change the world. It illustrates the benefits of adopting AI for the advancements of 5G, as well as showcases the latest progress made by Qualcomm Technologies, Inc.
5G is designed to serve an unprecedented range of capabilities with a single global standard. With enhanced mobile broadband (eMBB), massive IoT (mIoT), and mission-critical IoT, the three pillars of 5G represent extremes in performance and associated complexity. For IoT services, NB-IoT and eMTC devices prioritize low power consumption and the lowest complexity for wide-area deployments (LPWA), while enhanced ultra-reliable, low-latency communication (eURLLC), along with time-sensitive networking (TSN), delivers the most stringent use case requirements. But there exists an opportunity to more efficiently address a broad range of mid-tier applications with capabilities ranging between these extremes.
In 5G NR Release 17, 3GPP introduced a new tier of reduced capability (RedCap) devices, also known as NR-Light. It is a new device platform that bridges the capability and complexity gap between the extremes in 5G today with an optimized design for mid-tier use cases. With the recent standards completion, NR-Light is set to efficiently expand the 5G universe to connect new frontiers.
Download this presentation to learn:
• What NR-Light is and why it can herald the next wave of 5G expansion
• How NR-Light is accelerating the growth of the connected intelligent edge
• Why NR-Light is a suitable 5G migration path for mid-tier LTE devices
5G + AI: The Ingredients For Next Generation Wireless InnovationQualcomm Research
5G and AI are two of the most disruptive technologies the world has seen in decades. While each is individually revolutionizing industries and enabling new experiences, the combination of both 5G and AI is going to be truly transformative. Applying AI not only to the 5G network but also the device will lead to more efficient wireless communications, longer battery life and enhanced user experiences. The low latency and high capacity of 5G will also allow AI processing to be distributed amongst the device, edge cloud and central cloud, enabling flexible system solutions for a variety of use cases. At Qualcomm Technologies, we are not only working on cutting-edge research for 5G and AI, but we are also exploring their synergies to realize our vision of the future. View this presentation to learn how AI is making 5G better -- in the network and on the device, why on-device AI processing is essential, and how 5G is empowering distributed learning over wireless.
Qualcomm Webinar: Solving Unsolvable Combinatorial Problems with AIQualcomm Research
How do you find the best solution when faced with many choices? Combinatorial optimization is a field of mathematics that seeks to find the most optimal solutions for complex problems involving multiple variables. There are numerous business verticals that can benefit from combinatorial optimization, whether transport, supply chain, or the mobile industry.
More recently, we’ve seen gains from AI for combinatorial optimization, leading to scalability of the method, as well as significant reductions in cost. This method replaces the manual tuning of traditional heuristic approaches with an AI agent that provides a fast metric estimation.
In this presentation you will find out:
Why AI is crucial in combinatorial optimization
How it can be applied to two use cases: improving chip design and hardware-specific compilers
The state-of-the-art results achieved by Qualcomm AI Research
This presentation outlines the synergistic nature of 5G and AI -- two disruptive areas of innovations that can change the world. It illustrates the benefits of adopting AI for the advancements of 5G, as well as showcases the latest progress made by Qualcomm Technologies, Inc.
5G is designed to serve an unprecedented range of capabilities with a single global standard. With enhanced mobile broadband (eMBB), massive IoT (mIoT), and mission-critical IoT, the three pillars of 5G represent extremes in performance and associated complexity. For IoT services, NB-IoT and eMTC devices prioritize low power consumption and the lowest complexity for wide-area deployments (LPWA), while enhanced ultra-reliable, low-latency communication (eURLLC), along with time-sensitive networking (TSN), delivers the most stringent use case requirements. But there exists an opportunity to more efficiently address a broad range of mid-tier applications with capabilities ranging between these extremes.
In 5G NR Release 17, 3GPP introduced a new tier of reduced capability (RedCap) devices, also known as NR-Light. It is a new device platform that bridges the capability and complexity gap between the extremes in 5G today with an optimized design for mid-tier use cases. With the recent standards completion, NR-Light is set to efficiently expand the 5G universe to connect new frontiers.
Download this presentation to learn:
• What NR-Light is and why it can herald the next wave of 5G expansion
• How NR-Light is accelerating the growth of the connected intelligent edge
• Why NR-Light is a suitable 5G migration path for mid-tier LTE devices
Object detection is a computer technology related to computer vision and image processing that deals with detecting instances of semantic objects of a certain class (such as humans, buildings, or cars) in digital images and videos.
A Small Helping Hand from me to my Engineering collegues and my other friends in need of Object Detection
Self-Driving Cars With Convolutional Neural Networks (CNN.pptxssuserf79e761
Self-driving cars aim to revolutionize car travel by making it safe and efficient. In this article, we outlined some of the key components such as LiDAR, RADAR, cameras, and most importantly – the algorithms that make self-driving cars possible.
Few things need to be taken care of:
The algorithms used are not yet optimal enough to perceive roads and lanes because some roads lack markings and other signs.
The optimal sensing modality for localization, mapping, and perception still lack accuracy and efficiency.
Vehicle-to-vehicle communication is still a dream, but work is being done in this area as well.
The field of human-machine interaction is not explored enough, with many open, unsolved problems.
Self-driving cars aim to revolutionize car travel by making it safe and efficient. In this article, we outlined some of the key components such as LiDAR, RADAR, cameras, and most importantly – the algorithms that make self-driving cars possible.
Few things need to be taken care of:
The algorithms used are not yet optimal enough to perceive roads and lanes because some roads lack markings and other signs.
The optimal sensing modality for localization, mapping, and perception still lack accuracy and efficiency.
Vehicle-to-vehicle communication is still a dream, but work is being done in this area as well.
The field of human-machine interaction is not explored enough, with many open, unsolved problems.
Q-learning is one of the most commonly used DRL algorithms for self-driving cars. It comes under the category of model-free learning. In model-free learning, the agent will try to approximate the optimal state-action pair. The policy still determines which action-value pairs or Q-value are visited and updated (see the equation below). The goal is to find optimal policy by interacting with the environment while modifying the same when the agent makes an error.
Extended reality (XR) is already providing revolutionary experiences today, and it’s important for developers to know all the latest advances. Taking your XR development to the next level of immersion within the power and thermal constraints of a mobile device is a critical challenge. A new era in distributed computing powered by 5G, on-device processing, and edge cloud processing could offer a solution. What if we combine the power-efficient, latency-sensitive on-device rendering and tracking of the XR headset with the partial rendering capabilities of the edge cloud over a 5G link with low latency and high quality-of-service? We get boundless mobile XR experiences with photorealistic visuals. Making this vision a reality for developers will require the entire XR and 5G ecosystem coming together.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2022/06/autonomous-driving-ai-workloads-technology-trends-and-optimization-strategies-a-presentation-from-qualcomm/
Ahmed Sadek, Senior Director of Engineering at Qualcomm, presents the “Autonomous Driving AI Workloads: Technology Trends and Optimization Strategies” tutorial at the May 2022 Embedded Vision Summit.
Enabling safe, comfortable and affordable autonomous driving requires solving some of the most demanding and challenging technological problems. From centimeter-level localization to multimodal sensor perception, sensor fusion, behavior prediction, maneuver planning and trajectory planning and control, each one of these functions introduces its own unique challenges that must be solved, verified, tested and deployed on the road.
In this talk, Sadek reviews recent trends in AI workloads for autonomous driving as well as promising future directions. He covers AI workloads in camera, radar and lidar perception, AI workloads in environmental modeling, behavior prediction and drive policy. To enable optimized network performance at the edge, quantization and neural architecture optimization are typically performed either during training or post-training. Sadek also covers the importance of hardware-aware quantization and network architecture optimization, and introduces the innovation done by Qualcomm in these areas.
The Fog or Edge Computing model complements Cloud Computing with small, typically sensor-enabled and IOT connected devices that process distributed data at its source. As this model matures, we see an uptake on a 3-tier architecture with Intelligent Gateways to aggregate sensor input before communicating with data centers or a Cloud. Two forces will drive the practice of distributing Intelligence (Understanding/Reasoning/Learning) to the Gateway. The first is the presence of the Gateway itself, which enables a standards-based approach to distributing intelligence and moving it closer to the edge. The second is the trend for simplifying system requirements by processing training data or model validation with big data prior to deployment, and using small footprint devices for operational systems.
This webinar will present an overview of the relevant technologies and trends. Participants will learn about the state of the art today, and how to identify apps in their own environment that would be good candidates for Intelligent Edge solutions.
Bringing AI research to wireless communication and sensingQualcomm Research
AI for wireless is already here, with applications in areas such as mobility management, sensing and localization, smart signaling and interference management. Recently, Qualcomm Technologies has prototyped the AI-enabled air interface and launched the Qualcomm 5G AI Suite. These developments are possible thanks to expertise in both wireless and machine learning from over a decade of foundational research in these complementing fields.
Our approach brings together the modeling flexibility and computational efficiency of machine learning and the out-of-domain generalization and interpretability of wireless domain expertise.
In this webinar, Qualcomm AI Research presents an overview of state-of-the-art research at the intersection of the two fields and offers a glimpse into the future of the wireless industry.
Qualcomm AI Research is an initiative of Qualcomm Technologies, Inc.
Speakers:
Arash Behboodi, Machine Learning Research Scientist (Senior Staff Engineer/Manager), Qualcomm AI Research Daniel Dijkman, Machine Learning Research Scientist (Principal Engineer), Qualcomm AI Research
This video and the corresponding presentation looks at the definition of Extended Reality, XR and how 3GPP 5G standards are evolving to cater for XR in the future releases.
XR is an umbrella term for all the immersive technologies. The ones we already have today are Augmented Reality (AR), Virtual Reality (VR), and Mixed Reality (MR) plus those that are yet to be created. All immersive technologies extend the reality we experience by either blending the Real and/or Virtual Worlds or by creating a fully immersive experience.
The video for this presentation is available here: https://www.youtube.com/watch?v=nDAmK3jhCuQ
All our #3G4G5G slides and videos are available at:
Videos: https://www.youtube.com/3G4G5G
Slides: https://www.slideshare.net/3G4GLtd
5G Page: https://www.3g4g.co.uk/5G/
Free Training Videos: https://www.3g4g.co.uk/Training/
Soon gi Park (LetinAR): PinMR: Novel Optical Solution for AR GlassesAugmentedWorldExpo
A talk from the XR Enablement Track at AWE USA 2019 - the World's #1 XR Conference & Expo in Santa Clara, California May 29-31, 2019.
Soon gi Park (LetinAR): PinMR: Novel Optical Solution for AR Glasses
LetinAR is a Seoul-based startup developing see-through optical systems for wearable augmented reality devices. Based on our unique pin mirror technology which has quite different approaches from any other existing combiner optics, LetinAR has demonstrated ultrawide field-of-view more than 80 degrees with 8K high resolution. LetinAR also presented a form-factor-oriented glasses prototype device having the same appearance of normal glasses. In this presentation, we introduce a pin mirror technology and its benefits and contributions to wearable AR hardware including high image quality without any degradation, visual comfort supporting correct vergence and accommodation, wide field-of-view providing immersive experience as well as cost effective manufacturing process. We believe that the advantages of the pin mirror technology will not only suffice the optical requirements in future AR devices, but also shorten the beginning of AR/VR/MR era.
https://awexr.com
For the full video of this presentation, please visit:
http://www.embedded-vision.com/platinum-members/qualcomm/embedded-vision-training/videos/pages/may-2016-embedded-vision-summit-mangen
For more information about embedded vision, please visit:
http://www.embedded-vision.com
Michael Mangen, Product Manager for Camera and Computer Vision at Qualcomm, presents the "High-resolution 3D Reconstruction on a Mobile Processor" tutorial at the May 2016 Embedded Vision Summit.
Computer vision has come a long way. Use cases that were previously not possible in mass-market devices are now more accessible thanks to advances in depth sensors and mobile processors. In this presentation, Mangen provides an overview of how we are able to implement high-resolution 3D reconstruction – a capability typically requiring cloud/server processing – on a mobile processor. This is an exciting example of how new sensor technology and advanced mobile processors are bringing computer vision capabilities to broader markets.
Object detection is a computer technology related to computer vision and image processing that deals with detecting instances of semantic objects of a certain class (such as humans, buildings, or cars) in digital images and videos.
A Small Helping Hand from me to my Engineering collegues and my other friends in need of Object Detection
Self-Driving Cars With Convolutional Neural Networks (CNN.pptxssuserf79e761
Self-driving cars aim to revolutionize car travel by making it safe and efficient. In this article, we outlined some of the key components such as LiDAR, RADAR, cameras, and most importantly – the algorithms that make self-driving cars possible.
Few things need to be taken care of:
The algorithms used are not yet optimal enough to perceive roads and lanes because some roads lack markings and other signs.
The optimal sensing modality for localization, mapping, and perception still lack accuracy and efficiency.
Vehicle-to-vehicle communication is still a dream, but work is being done in this area as well.
The field of human-machine interaction is not explored enough, with many open, unsolved problems.
Self-driving cars aim to revolutionize car travel by making it safe and efficient. In this article, we outlined some of the key components such as LiDAR, RADAR, cameras, and most importantly – the algorithms that make self-driving cars possible.
Few things need to be taken care of:
The algorithms used are not yet optimal enough to perceive roads and lanes because some roads lack markings and other signs.
The optimal sensing modality for localization, mapping, and perception still lack accuracy and efficiency.
Vehicle-to-vehicle communication is still a dream, but work is being done in this area as well.
The field of human-machine interaction is not explored enough, with many open, unsolved problems.
Q-learning is one of the most commonly used DRL algorithms for self-driving cars. It comes under the category of model-free learning. In model-free learning, the agent will try to approximate the optimal state-action pair. The policy still determines which action-value pairs or Q-value are visited and updated (see the equation below). The goal is to find optimal policy by interacting with the environment while modifying the same when the agent makes an error.
Extended reality (XR) is already providing revolutionary experiences today, and it’s important for developers to know all the latest advances. Taking your XR development to the next level of immersion within the power and thermal constraints of a mobile device is a critical challenge. A new era in distributed computing powered by 5G, on-device processing, and edge cloud processing could offer a solution. What if we combine the power-efficient, latency-sensitive on-device rendering and tracking of the XR headset with the partial rendering capabilities of the edge cloud over a 5G link with low latency and high quality-of-service? We get boundless mobile XR experiences with photorealistic visuals. Making this vision a reality for developers will require the entire XR and 5G ecosystem coming together.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2022/06/autonomous-driving-ai-workloads-technology-trends-and-optimization-strategies-a-presentation-from-qualcomm/
Ahmed Sadek, Senior Director of Engineering at Qualcomm, presents the “Autonomous Driving AI Workloads: Technology Trends and Optimization Strategies” tutorial at the May 2022 Embedded Vision Summit.
Enabling safe, comfortable and affordable autonomous driving requires solving some of the most demanding and challenging technological problems. From centimeter-level localization to multimodal sensor perception, sensor fusion, behavior prediction, maneuver planning and trajectory planning and control, each one of these functions introduces its own unique challenges that must be solved, verified, tested and deployed on the road.
In this talk, Sadek reviews recent trends in AI workloads for autonomous driving as well as promising future directions. He covers AI workloads in camera, radar and lidar perception, AI workloads in environmental modeling, behavior prediction and drive policy. To enable optimized network performance at the edge, quantization and neural architecture optimization are typically performed either during training or post-training. Sadek also covers the importance of hardware-aware quantization and network architecture optimization, and introduces the innovation done by Qualcomm in these areas.
The Fog or Edge Computing model complements Cloud Computing with small, typically sensor-enabled and IOT connected devices that process distributed data at its source. As this model matures, we see an uptake on a 3-tier architecture with Intelligent Gateways to aggregate sensor input before communicating with data centers or a Cloud. Two forces will drive the practice of distributing Intelligence (Understanding/Reasoning/Learning) to the Gateway. The first is the presence of the Gateway itself, which enables a standards-based approach to distributing intelligence and moving it closer to the edge. The second is the trend for simplifying system requirements by processing training data or model validation with big data prior to deployment, and using small footprint devices for operational systems.
This webinar will present an overview of the relevant technologies and trends. Participants will learn about the state of the art today, and how to identify apps in their own environment that would be good candidates for Intelligent Edge solutions.
Bringing AI research to wireless communication and sensingQualcomm Research
AI for wireless is already here, with applications in areas such as mobility management, sensing and localization, smart signaling and interference management. Recently, Qualcomm Technologies has prototyped the AI-enabled air interface and launched the Qualcomm 5G AI Suite. These developments are possible thanks to expertise in both wireless and machine learning from over a decade of foundational research in these complementing fields.
Our approach brings together the modeling flexibility and computational efficiency of machine learning and the out-of-domain generalization and interpretability of wireless domain expertise.
In this webinar, Qualcomm AI Research presents an overview of state-of-the-art research at the intersection of the two fields and offers a glimpse into the future of the wireless industry.
Qualcomm AI Research is an initiative of Qualcomm Technologies, Inc.
Speakers:
Arash Behboodi, Machine Learning Research Scientist (Senior Staff Engineer/Manager), Qualcomm AI Research Daniel Dijkman, Machine Learning Research Scientist (Principal Engineer), Qualcomm AI Research
This video and the corresponding presentation looks at the definition of Extended Reality, XR and how 3GPP 5G standards are evolving to cater for XR in the future releases.
XR is an umbrella term for all the immersive technologies. The ones we already have today are Augmented Reality (AR), Virtual Reality (VR), and Mixed Reality (MR) plus those that are yet to be created. All immersive technologies extend the reality we experience by either blending the Real and/or Virtual Worlds or by creating a fully immersive experience.
The video for this presentation is available here: https://www.youtube.com/watch?v=nDAmK3jhCuQ
All our #3G4G5G slides and videos are available at:
Videos: https://www.youtube.com/3G4G5G
Slides: https://www.slideshare.net/3G4GLtd
5G Page: https://www.3g4g.co.uk/5G/
Free Training Videos: https://www.3g4g.co.uk/Training/
Soon gi Park (LetinAR): PinMR: Novel Optical Solution for AR GlassesAugmentedWorldExpo
A talk from the XR Enablement Track at AWE USA 2019 - the World's #1 XR Conference & Expo in Santa Clara, California May 29-31, 2019.
Soon gi Park (LetinAR): PinMR: Novel Optical Solution for AR Glasses
LetinAR is a Seoul-based startup developing see-through optical systems for wearable augmented reality devices. Based on our unique pin mirror technology which has quite different approaches from any other existing combiner optics, LetinAR has demonstrated ultrawide field-of-view more than 80 degrees with 8K high resolution. LetinAR also presented a form-factor-oriented glasses prototype device having the same appearance of normal glasses. In this presentation, we introduce a pin mirror technology and its benefits and contributions to wearable AR hardware including high image quality without any degradation, visual comfort supporting correct vergence and accommodation, wide field-of-view providing immersive experience as well as cost effective manufacturing process. We believe that the advantages of the pin mirror technology will not only suffice the optical requirements in future AR devices, but also shorten the beginning of AR/VR/MR era.
https://awexr.com
For the full video of this presentation, please visit:
http://www.embedded-vision.com/platinum-members/qualcomm/embedded-vision-training/videos/pages/may-2016-embedded-vision-summit-mangen
For more information about embedded vision, please visit:
http://www.embedded-vision.com
Michael Mangen, Product Manager for Camera and Computer Vision at Qualcomm, presents the "High-resolution 3D Reconstruction on a Mobile Processor" tutorial at the May 2016 Embedded Vision Summit.
Computer vision has come a long way. Use cases that were previously not possible in mass-market devices are now more accessible thanks to advances in depth sensors and mobile processors. In this presentation, Mangen provides an overview of how we are able to implement high-resolution 3D reconstruction – a capability typically requiring cloud/server processing – on a mobile processor. This is an exciting example of how new sensor technology and advanced mobile processors are bringing computer vision capabilities to broader markets.
DISTRIBUTED SYSTEM FOR 3D REMOTE MONITORING USING KINECT DEPTH CAMERAScscpconf
This article describes the design and development ofa system for remote indoor 3D monitoring
using an undetermined number of Microsoft® Kinect sensors. In the proposed client-server
system, the Kinect cameras can be connected to different computers, addressing this way the
hardware limitation of one sensor per USB controller. The reason behind this limitation is the
high bandwidth needed by the sensor, which becomes also an issue for the distributed system
TCP/IP communications. Since traffic volume is too high, 3D data has to be compressed before
it can be sent over the network. The solution consists in self-coding the Kinect data into RGB
images and then using a standard multimedia codec to compress color maps. Information from
different sources is collected into a central client computer, where point clouds are transformed
to reconstruct the scene in 3D. An algorithm is proposed to conveniently merge the skeletons
detected locally by each Kinect, so that monitoring of people is robust to self and inter-user
occlusions. Final skeletons are labeled and trajectories of every joint can be saved for event
reconstruction or further analysis.
A Vision-Based Mobile Platform for Seamless Indoor/Outdoor PositioningGuillaume Gales
The emergence of smartphones equipped with Internet access, high resolution cameras, and posi- tioning sensors opens up great opportunities for visualising geospatial information within augmented reality applications. While smartphones are able to provide geolocalisation, the inherent uncertainty in the estimated position, especially indoors, does not allow for completely accurate and robust alignment of the data with the camera images.
In this paper we present a system that exploits computer vision techniques in conjunction with GPS and inertial sensors to create a seamless indoor/outdoor positioning vision-based platform. The vision-based approach estimates the pose of the camera relative to the fac ̧ade of a building and recognises the fac ̧ade from a georeferenced image database. This permits the insertion of 3D widgets into the user’s view with a known orientation relative to the fac ̧ade. For example, in Figure 1 (a) we show how this feature can be used to overlay directional information on the input image. Furthermore we provide an easy and intuitive interface for non-expert users to add their own georeferenced content to the system, encouraging volunteering GI. Indeed, to achieve this users only need to drag and drop predefined 3D widgets into a reference view of the fac ̧ade, see Figure 1 (b). The infrastructure is flexible in that we can add different layers of content on top of the fac ̧ades and hence, this opens many possibilities for different applications. Furthermore the system provides a representation suitable for both manual and automatic content authoring.
Laser scanning constitutes the deflection of laser beams. 3D Scanning involves the process of analyzing real-world objects for collecting data of shapes and appearance. The collected data is then used for constructing digital 3D Models, consisting of a polygon mesh or point cloud of geometric samples on the surface of the object.
https://www.tejjy.com/our-services/bim-services/3d-laser-scanning-services/
Indoor 3D Video Monitoring Using Multiple Kinect Depth-Camerasijma
This article describes the design and development of a system for remote indoor 3D monitoring using an
undetermined number of Microsoft® Kinect sensors. In the proposed client-server system, the Kinect
cameras can be connected to different computers, addressing this way the hardware limitation of one
sensor per USB controller. The reason behind this limitation is the high bandwidth needed by the sensor,
which becomes also an issue for the distributed system TCP/IP communications. Since traffic volume is too
high, 3D data has to be compressed before it can be sent over the network. The solution consists in selfcoding the Kinect data into RGB images and then using a standard multimedia codec to compress color
maps. Information from different sources is collected into a central client computer, where point clouds are
transformed to reconstruct the scene in 3D. An algorithm is proposed to merge the skeletons detected
locally by each Kinect conveniently, so that monitoring of people is robust to self and inter-user occlusions.
Final skeletons are labeled and trajectories of every joint can be saved for event reconstruction or further
analysis.
Indoor 3 d video monitoring using multiple kinect depth camerasijma
This article describes the design and development of a system for remote indoor 3D monitoring using an
undetermined number of Microsoft® Kinect sensors. In the proposed client-server system, the Kinect
cameras can be connected to different computers, addressing this way the hardware limitation of one
sensor per USB controller. The reason behind this limitation is the high bandwidth needed by the sensor,
which becomes also an issue for the distributed system TCP/IP communications. Since traffic volume is too
high, 3D data has to be compressed before it can be sent over the network. The solution consists in selfcoding
the Kinect data into RGB images and then using a standard multimedia codec to compress color
maps. Information from different sources is collected into a central client computer, where point clouds are
transformed to reconstruct the scene in 3D. An algorithm is proposed to merge the skeletons detected
locally by each Kinect conveniently, so that monitoring of people is robust to self and inter-user occlusions.
Final skeletons are labeled and trajectories of every joint can be saved for event reconstruction or further
analysis.
MEMS Laser Scanning, the platform for next generation of 3D Depth SensorsJari Honkanen
MicroVision's MEMS Laser Beam Scanning based 3D Depth Sensing Technology presentation by Jari Honkanen at MEMS & Sensors Industry Group Conference Asia 2016, Shanghai, China, September 13-14, 2016
iMinds insights - 3D Visualization TechnologiesiMindsinsights
Transforming the way we deal with information - from consumption to interaction.
iMinds insights is a quarterly publication providing you with relevant tech updates based on interviews with academic and industry experts. iMinds is a digital research center and incubator based in Belgium.
Mitchell Reifel (pmdtechnologies ag): pmd Time-of-Flight – the Swiss Army Kni...AugmentedWorldExpo
A talk from the Develop Track at AWE USA 2018 - the World's #1 XR Conference & Expo in Santa Clara, California May 30- June 1, 2018.
Mitchell Reifel (pmdtechnologies ag): pmd Time-of-Flight – the Swiss Army Knife of 3D depth sensing
pmd's Time-of-Flight technology is integrated into two AR-smartphones on the market! pmd ToF is in 4 AR headsets! This talk will show what pmd has achieved, what they can do with our 3D ToF technology and why depth sensing is one secret sauce for AR, VR and MR.
http://AugmentedWorldExpo.com
Concept of Virtual reality
Virtual Reality Components of VR System, Types of VR
System, 3D Position Trackers, Navigation and Manipulation
Interfaces
Visual computation in virtual reality
Augmented Reality
Application of VR
Generative AI models, such as ChatGPT and Stable Diffusion, can create new and original content like text, images, video, audio, or other data from simple prompts, as well as handle complex dialogs and reason about problems with or without images. These models are disrupting traditional technologies, from search and content creation to automation and problem solving, and are fundamentally shaping the future user interface to computing devices. Generative AI can apply broadly across industries, providing significant enhancements for utility, productivity, and entertainment. As generative AI adoption grows at record-setting speeds and computing demands increase, on-device and hybrid processing are more important than ever. Just like traditional computing evolved from mainframes to today’s mix of cloud and edge devices, AI processing will be distributed between them for AI to scale and reach its full potential.
In this presentation you’ll learn about:
- Why on-device AI is key
- Full-stack AI optimizations to make on-device AI possible and efficient
- Advanced techniques like quantization, distillation, and speculative decoding
- How generative AI models can be run on device and examples of some running now
- Qualcomm Technologies’ role in scaling on-device generative AI
As generative AI adoption grows at record-setting speeds and computing demands increase, hybrid processing is more important than ever. But just like traditional computing evolved from mainframes and thin clients to today’s mix of cloud and edge devices, AI processing must be distributed between the cloud and devices for AI to scale and reach its full potential. In this talk you’ll learn:
• Why on-device AI is key
• Which generative AI models can run on device
• Why the future of AI is hybrid
• Qualcomm Technologies’ role in making hybrid AI a reality
- There is a rich roadmap of 5G technologies coming in the second half of the 5G decade with the 5G Advanced evolution
- 6G will be the future innovation platform for 2030 and beyond building on the 5G Advanced foundation
- 6G will be more than just a new radio design, expanding the role of AI, sensing and others in the connected intelligent edge
- Qualcomm is leading cutting-edge wireless research across six key technology vectors on the path to 6G
5G is going mainstream across the globe, and this is an exciting time to harness the low latency and high capacity of 5G to enable the metaverse. A distributed-compute architecture across device and cloud can enable rich extended reality (XR) user experiences. Virtual reality (VR) and mixed reality (MR) are ready for deployment in private networks, while augmented reality (AR) for wide area networks can be enabled in the near term with Wi-Fi powered AR glasses paired with a 5G-enabled phone. Device APIs enabling application adaptation is critical for good user experience. 5G standards are evolving to support the deployment of AR glasses at a large scale and setting the stage for 6G-era with the merging of the physical, digital, and virtual worlds. Techniques like perception-enhanced wireless offer significant potential to improve user experience. Qualcomm Technologies is enabling the XR industry with platforms, developer SDKs, and reference designs.
Check out this webinar to learn:
• How 5G and distributed-compute architectures enable the metaverse
• The latest results from our boundless XR 5G/6G testbed, including device APIs and perception-enhanced wireless
• 5G standards evolution for enhancing XR applications and the road to 6G
• How Qualcomm Technologies is enabling the industry with platforms, SDKs, and reference designs
AI model efficiency is crucial for making AI ubiquitous, leading to smarter devices and enhanced lives. Besides the performance benefit, quantized neural networks also increase power efficiency for two reasons: reduced memory access costs and increased compute efficiency.
The quantization work done by the Qualcomm AI Research team is crucial in implementing machine learning algorithms on low-power edge devices. In network quantization, we focus on both pushing the state-of-the-art (SOTA) in compression and making quantized inference as easy to access as possible. For example, our SOTA work on oscillations in quantization-aware training that push the boundaries of what is possible with INT4 quantization. Furthermore, for ease of deployment, the integer formats such as INT16 and INT8 give comparable performance to floating point, i.e., FP16 and FP8, but have significantly better performance-per-watt performance. Researchers and developers can make use of this quantization research to successfully optimize and deploy their models across devices with open-sourced tools like AI Model Efficiency Toolkit (AIMET).
Presenters: Tijmen Blankevoort and Chirag Patel
How will sidelink bring a new level of 5G versatility.pdfQualcomm Research
Today, the 5G system mainly operates on a network-to-device communication model, exemplified by enhanced mobile broadband use cases where all data transmissions are between the network (i.e., base station) and devices (e.g., smartphone). However, to fully deliver on the original 5G vision of supporting diverse devices, services, and deployment scenarios, we need to expand the 5G topology further to reach new levels of performance and efficiency.
That is why sidelink communication was introduced in 3GPP standards, designed to facilitate direct communication between devices, independent of connectivity via the cellular infrastructure. Beyond automotive communication, it also benefits many other 5G use cases such as IoT, mobile broadband, and public safety.
Realizing mission-critical industrial automation with 5GQualcomm Research
Manufacturers seeking better operational efficiencies, with reduced downtime and higher yield, are at the leading edge of the Industry 4.0 transformation. With mobile system components and reliable wireless connectivity between them, flexible manufacturing systems can be reconfigured quickly for new tasks, to troubleshoot issues, or in response to shifts in supply and demand.
There is a long history of R&D collaboration between Bosch Rexroth and Qualcomm Technologies for the effective application of these 5G capabilities to industrial automation use cases. At the Robert Bosch Elektronik GmbH factory in Salzgitter, Germany, this collaboration has reached new heights.
Download this deck to learn how:
• Qualcomm Technologies and Bosch Rexroth are collaborating to accelerate the Industry 4.0 transformation
• 5G technologies deliver key capabilities for mission-critical industrial automation
• Distributed control solutions can work effectively across 5G TSN networks
• A single 5G technology platform solves connectivity and positioning needs for flexible manufacturing
3GPP Release 17: Completing the first phase of 5G evolutionQualcomm Research
This presentation summarizes 5G NR Release 17 projects that was completed in March 2022. It further enhances 5G foundation and expands into new devices, use cases, verticals.
AI firsts: Leading from research to proof-of-conceptQualcomm Research
AI has made tremendous progress over the past decade, with many advancements coming from fundamental research from many decades ago. Accelerating the pipeline from research to commercialization has been daunting since scaling technologies in the real world faces many challenges beyond the theoretical work done in the lab. Qualcomm AI Research has taken on the task of not only generating novel AI research but also being first to demonstrate proof-of-concepts on commercial devices, enabling technology to scale in the real world. This presentation covers:
The challenges of deploying cutting-edge research on real-world mobile devices
How Qualcomm AI Research is solving system and feasibility challenges with full-stack optimizations to quickly move from research to commercialization
Examples where Qualcomm AI Research has had industrial or academic firsts
Setting off the 5G Advanced evolution with 3GPP Release 18Qualcomm Research
In December 2021, 3GPP has reached a consensus on the scope of 5G NR Release 18. This is a significant milestone marking the beginning of 5G Advanced — the second wave of wireless innovations that will fulfill the 5G vision. Release 18 will build on the solid foundation set by Releases 15, 16, and 17, and it sets the longer-term evolution direction of 5G and beyond. This release will encompass a wide range of new and enhancement projects, ranging from improved MIMO and application of AI/ML-enabled air interface to extended reality optimizations and broader IoT support.
Cellular networks have facilitated positioning in addition to voice or data communications from the beginning, since 2G, and we’ve since grown to rely on positioning technology to make our lives safer, simpler, more productive, and even fun. Cellular positioning complements other technologies to operate indoors and outdoors, including dense urban environments where tall buildings interfere with satellite positioning. It works whether we’re standing still, walking, or in a moving vehicle. With 5G, cellular positioning breaks new ground to bring robust precise positioning indoors and outdoors, to meet even the most demanding Industry 4.0 needs.
As we look to the future, the Connected Intelligent Edge will bring a new dimension of positional insight to a broad range of devices, improving wireless use cases still under development. We’re already charting the course to 5G Advanced and beyond by working on the evolution of cellular positioning technology to include RF sensing for situational awareness.
Download the deck to learn more.
The need for intelligent, personalized experiences powered by AI is ever-growing. Our devices are producing more and more data that could help improve our AI experiences. How do we learn and efficiently process all this data from edge devices while maintaining privacy? On-device learning rather than cloud training can address these challenges. In this presentation, we’ll discuss:
- Why on-device learning is crucial for providing intelligent, personalized experiences without sacrificing privacy
- Our latest research in on-device learning, including few-shot learning, continuous learning, and federated learning
- How we are solving system and feasibility challenges to move from research to commercialization
Data compression has increased by leaps and bounds over the years due to technical innovation, enabling the proliferation of streamed digital multimedia and voice over IP. For example, a regular cadence of technical advancement in video codecs has led to massive reduction in file size – in fact, up to a 1000x reduction in file size when comparing a raw video file to a VVC encoded file. However, with the rise of machine learning techniques and diverse data types to compress, AI may be a compelling tool for next-generation compression, offering a variety of benefits over traditional techniques. In this presentation we discuss:
- Why the demand for improved data compression is growing
- Why AI is a compelling tool for compression in general
- Qualcomm AI Research’s latest AI voice and video codec research
- Our future AI codec research work and challenges
Artificial Intelligence (AI), specifically deep learning, is revolutionizing industries, products, and core capabilities by delivering dramatically enhanced experiences. However, the deep neural networks of today use too much memory, compute, and energy. To make AI truly ubiquitous, it needs to run on the end device within tight power and thermal budgets. Advancements in multiple areas are necessary to improve AI model efficiency, including quantization, compression, compilation, and neural architecture search (NAS). In this presentation, we’ll discuss:
- Qualcomm AI Research’s latest model efficiency research
- Our new NAS research to optimize neural networks more easily for on-device efficiency
- How the AI community can take advantage of this research though our open-source projects, such as the AI Model Efficiency Toolkit (AIMET) and AIMET Model Zoo
How to build high performance 5G networks with vRAN and O-RANQualcomm Research
5G networks are poised to deliver an unprecedented amount of data from a richer set of use cases than we have ever seen. This makes efficient networking in terms of scalability, cost, and power critical for the sustainable growth of 5G. Cloud technologies such as virtualization, containerization and orchestration are now powering a surge of innovation in virtualized radio access network (vRAN) infrastructure with modular hardware and software components, and standardized interfaces. While commercial off-the-shelf (COTS) hardware platforms provide the compute capacity for running vRAN software, hardware accelerators will also play a major role in offloading real-time and complex signal processing functions. Together, COTS platforms and hardware accelerators provide the foundation for building the intelligent 5G network and facilitate innovative new use cases with the intelligent wireless edge.
This presentation takes a look at the technology roadmap for 5G NR millimeter wave (mmWave). Including features such as integrated access and backhaul (IAB), enhancements in beam management, mobility, coverage, and more. For more information, please visit www.qualcomm.com/mmwave
Video data is abundant and being generated at ever increasing rates. Analyzing video with AI can provide valuable insights and capabilities for many applications ranging from autonomous driving and smart cameras to smartphones and extended reality. However, as video resolution and frame rates increase while AI video perception models become more complex, running these workloads in real time is becoming more challenging. This presentation explores the latest research that is enabling efficient video perception while maintaining neural network model accuracy. You’ll learn about:
- How video perception is crucial for understanding the world and making devices smarter
- The challenges of on-device real-time video perception at high resolution through AI
- Qualcomm AI Research’s latest research and techniques for efficient video perception
Checkout: https://www.qualcomm.com/AI
Enabling the rise of the smartphone: Chronicling the developmental history at...Qualcomm Research
Today’s smartphones are a marvel of modern technology — handheld devices with vast computing power, incredible multimedia and AI capabilities, and blazing fast data rates that support mobile browsing, social media interaction, and more. From humble beginnings as a cellphone focused purely on voice communication, the capability and functionality of modern smartphones have advanced tremendously. This presentation chronicles Qualcomm’s role in the rise of the smartphone from its initial beginnings to becoming the largest computing platform in the world. It includes:
- Key technology developments that led to today’s smartphones
- The role of Moore’s Law in driving new innovations and additional integration into mobile processors
- Qualcomm’s critical role in advancing the smartphone’s capabilities through groundbreaking innovations and key technology developments
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Let's dive deeper into the world of ODC! Ricardo Alves (OutSystems) will join us to tell all about the new Data Fabric. After that, Sezen de Bruijn (OutSystems) will get into the details on how to best design a sturdy architecture within ODC.
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Search and Society: Reimagining Information Access for Radical FuturesBhaskar Mitra
The field of Information retrieval (IR) is currently undergoing a transformative shift, at least partly due to the emerging applications of generative AI to information access. In this talk, we will deliberate on the sociotechnical implications of generative AI for information access. We will argue that there is both a critical necessity and an exciting opportunity for the IR community to re-center our research agendas on societal needs while dismantling the artificial separation between the work on fairness, accountability, transparency, and ethics in IR and the rest of IR research. Instead of adopting a reactionary strategy of trying to mitigate potential social harms from emerging technologies, the community should aim to proactively set the research agenda for the kinds of systems we should build inspired by diverse explicitly stated sociotechnical imaginaries. The sociotechnical imaginaries that underpin the design and development of information access technologies needs to be explicitly articulated, and we need to develop theories of change in context of these diverse perspectives. Our guiding future imaginaries must be informed by other academic fields, such as democratic theory and critical theory, and should be co-developed with social science scholars, legal scholars, civil rights and social justice activists, and artists, among others.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
2. 3
Today’s agenda
• Advantages of 3D perception over 2D
• The need for 3D perception
across applications
• Advancements in 3D perception
by Qualcomm AI Research
• Future 3D perception
research directions
• Questions?
3. 4
We perceive
the world in 3D
3D perception offers
many benefits and new
capabilities over 2D
• 3D structure is more
reliable than a 2D image
for perception
• 3D provides confident
cues for object and
scene recognition
• 3D allows accurate size,
pose, and motion estimation
• 3D is needed for rendering
by light and RF signals
4. 5
3D data
acquisition
Multiple methods
offering different
benefits
Passive sensing
Determining 3D from signals emitted
by an uncontrolled illuminator
Active sensing
Illuminating and reconstructing 3D
from reflected back signals
Acquisition Methods
Light: Time-of-flight, structured light, LiDAR
RF: Radar, SAR, THz imaging
Other: CT scan, MRI, ultrasound, …
Benefits
Works in dark and various conditions
DFS: Depth From Stereo; MVS: Multi-View Stereo; SAR: Synthetic-Aperture Radar
Computational Methods
Geometry: Traditional CV using multiple cameras (DFS, MVS)
Inference: Via shadow, blur, and motion parallax cues
Regression: From appearance to 3D (AI driven)
Benefits
Lower cost and lower power
7. 8
3D perception empowers
autonomous driving
3D map reconstruction
Vehicle positioning
Finding navigable road
surfaces and avoiding
obstacles
Detecting and estimating
trajectories of vehicles,
pedestrians, and other
objects for path planning
Long-range
near camera
LR radar
Rear-side
camera
SR radar
MR camera
LR camera
LR radar
SR radar
Side camera
Side camera
Configurable set of
calibrated camera, radar
and LiDAR sensors
C-V2X
8. 9
9
3D perception is important for
robotics
Situational awareness
Scene and human interaction
Object, scene, and motion recognition
Motion planning and navigation
Obstacle avoidance
Delivery and factory automation
Pick-and-place and assembly
Grabbing and placing objects
Putting things together
9. 10
10
Image quality
Denoising, deblurring, relighting, HDR, etc.
Filtering effects
Bokeh, depth-of-field, etc.
Content editing
Seamless object removal, beautification, etc.
3D perception vastly improves
computational photography and cinematography
10. 11
11
3D perception is valuable in many more areas
3D perception is highly valuable in many more areas
Access control and
surveillance
E-commerce, virtual
try-out, and virtual
walkthrough
Inspection and
asset control
Inventory, port, mining,
and construction
management
Medical
and biology
11. 12
12
Neuro SLAM
AI for simultaneous localization & mapping
3D scene in RF
Scene understanding using RF signals
Neural radiance fields
Networks to synthesize virtual views
3D imitation learning
Learning 3D robotics tasks from human videos
is crucial for
understanding
the 3D world
3D
perception
Object detection Scene understanding
3DR: 3D Reconstruction
Pose estimation
Finding orientation and
key-points of objects
Depth estimation & 3DR
Creating 3D models of scenes
and objects from 2D images
Neuro SLAM
AI for simultaneous localization & mapping
3D scene in RF
Scene understanding using RF signals
Neural radiance fields
Networks to synthesize virtual views
3D imitation learning
Learning 3D robotics tasks from human videos
Finding positions and regions
of individual objects
Decomposing a scene
into its 3D and physical components
12. 13
By Qualcomm
AI Research
Leading
3D perception
research
Object detection Scene understanding
Qualcomm AI Research is an initiative of Qualcomm Technologies, Inc.
Pose estimation
Depth estimation & 3DR
Novel AI techniques
for 3D perception
Full-stack AI optimizations to enable
real-world deployments
Energy-efficient platform to make
3D perception ubiquitous
Supervised and self-supervised learning
for mono & stereo with transformers
Efficient networks that can interpret 3D human
body pose and hand pose from 2D images
Efficient neural architectures that
leverage sparsity and polar spaces
End-to-end trained pipeline for
room-layout, surface normal, albedo,
material, object, and lighting estimation
World’s first real-time monocular depth
estimation on phone that can create
3D from a single image
Top accurate detection of vehicles,
pedestrians, traffic signs on
LiDAR 3D point clouds
Computationally scalable architecture
that iteratively improves key-point
detection with less than 5mm error
World’s first transformer-based
solution for indoor scenes that enables
photorealistic object insertion
13. 14
14
Image pixels are arranged on a uniform grid,
while 3D point cloud faces accessibility vs memory trade-off
Implementation
challenges
Data
challenges
Sparse vs volumetric
nature of 3D point cloud
Incompleteness in
3D acquisition
Availability of high-quality
3D video datasets
HW/SW platform
(memory, SDKs, tools)
Manipulation, viewpoint
management
Computational load
(training/inference) Mem
SDK
3D
perception
challenges
14. 15
Enabling AI-based
self-supervised
depth estimation
from a single image
1: X-Distill: Improving Self-Supervised Monocular Depth via Cross-Task Distillation, BMVC 2021
2: InverseForm: A Loss Function for Structured Boundary-Aware Segmentation, CVPR 2021
Raw video obtained from Cityscapes Benchmark: https://www.cityscapes-dataset.com/
Self-supervised training pipeline of X-Distill
Test pipeline
𝐼𝑠
Depth network
View synthesis Photometric loss
6DoF network
𝐼𝑡
መ
𝐼𝑡
𝑇𝑡→𝑠
𝐷𝑡
Pretrained
segmentation
network
Depth-to-segmentation
network
Segmentation
loss
𝐷𝑡
𝑆𝑡
𝐼𝑡
Smoothness loss
Trainable unit
Loss function
Semantic guidance
Geometric guidance
Depth network
No need for annotated data
• Self-supervised learning from
unlabeled monocular videos
• Utilizes geometric relationship
across video frames
Builds on semantics1
• Significantly improves accuracy by
using semantic segmentation2
Works on any neural network
architecture
• Modifies only training and requires
no additional inference computation
Enables automatic
domain adaptation
15. 16
16
Raw video obtained from Cityscapes Benchmark: https://www.cityscapes-dataset.com/
Achieving SOTA accuracy with 90% less computation
PackNet
HR-Depth
Ours
16. 17
Improved on-device efficiency
with neural architecture search
Pre-trained Model
Set of Pareto-optimal
network architectures
Diverse search
Supports diverse spaces with different
neural operations to find the best models
Low cost
Scales to many HW devices
at minimal cost
Scalable
Low start-up cost of 1000-4000 epochs,
equivalent to training 2-10 networks
from scratch
Reliable
Uses direct HW measurements
instead of a potentially
inaccurate HW model
Enabling real-time on-device use cases
Distilling Optimal Neural
Architectures (DONNA)1
Hardware aware
20-40% faster models
1: Distilling Optimal Neural Networks: Rapid Search in Diverse Spaces, ICCV 2021
Snapdragon is a product of Qualcomm Technologies, Inc. and/or its subsidiaries.
Search space Accuracy prediction training
HW-aware search
17. 18
18
1: X-Distill/DONNA demo at NeurIPS 2021
AIMET is a product of Qualcomm Innovation Center, Inc. Raw video obtained from Cityscapes Benchmark: https://www.cityscapes-dataset.com/
Running monocular depth estimation in real time
on Snapdragon mobile platform
Model Parameters (M) FPS Error
Monodepth2 (RN50) Quantized 32.0 23 0.83
Our X-Distill (RN50) Quantized 32.0 23 0.71
Our X-Distill DONNA Quantized1 3.7 35 0.75
~10X reduction
in parameters
More efficient
model
More accurate
model
• Quantization through AI Model Efficiency Toolkit (AIMET)
• 30+ FPS by using more efficient backbone optimized via
DONNA neural architecture search
18. 19
Novel transformer architecture
leverages spatial self-attention
for depth network
• Smaller model that runs in real time
• Improved 3D recall in reconstructed meshes
During training:
• Self-supervised learning for fine-tuning
transformers to improve temporal
consistency and eliminate texture copy
• Sparse depth from 6DoF to resolve
global scale issues
Enhancing the model
with a new efficient
transformer architecture
Depth
network
Depth
network
Depth
network
Depth
network
Unified
mesh
Depth network
Position
embedding
Aff,
context
Multi-head attention
U
Decoder
Encoder
6DoF: 6 degrees-of-freedom
19. 20
20
Qualcomm Hexagon is a product of Qualcomm Technologies, Inc. and/or its subsidiaries.
smaller model fits on a phone
processor and runs real-time on
Qualcomm® Hexagon™ Processor
26x
Model Parameters (M) 3DR recall
Monodepth2 (RN34) 15.0 0.653
DPT (MIDAS, transformer based) 123.0 0.926
Ours (transformer based) 4.7 0.930
Image from XR camera Monodepth2 (RN34) model Our transformer-based model
Our transformer-based model is
both smaller and more accurate
20. 21
21
Our transformer-based model achieves
better visual quality than current SOTA
Time-of-flight Monodepth2
NeuralRecon Ours
Depth Network
Depth Network
Depth Network
Depth Network
Unified Mesh
21. 22
Similar to human visual perception
Highly accurate disparity estimation
model using two images (left/right)
• Fits into memory by just-in-time computation
• Geometry is much sharper and more detailed
• Greater generalizability
Models leverage Snapdragon’s
heterogenous computing
• Real-time performance
• Subpixel disparity accuracy
Extends to motion parallax
From single image to
stereo depth estimation
for increased accuracy
Look-up
Just-in-time cost volumes
Look-up Look-up
RNN
Encoder
Encoder
Context
22. 23
23
JiT: Just-in-time
SOTA: RAFT-Stereo
Our JiT stereo depth estimation model runs in real-time
20x faster than SOTA with similar accuracy on Hexagon Processor
Left image Right image
23. 24
Fast Polar Attentive 3D Object Detection on LiDAR Point Clouds, NeurIPS MLAD
Enabling efficient
object detection
in 3D point clouds Positional
encoding
Patch
embedding
Polar
features
Queries
Encoder input
Transformer
Backbone
A transformer-based architecture
• Leverages 2D pseudo-image features
extracted in the polar space
• Reduces latency and memory usage
without sacrificing detection accuracy
• Sectors data for a stream-wise
processing to make predictions
without requiring a complete
360 LiDAR scan
24. 25
25
Fast Polar Attentive 3D Object Detection on LiDAR Point Clouds, NeurIPS MLAD workshop, 2021
Smaller, faster, lower power model that achieves the top precision
Model Parameters (M) GFLOPs Inference time (ms) Accuracy (AP)
PointRCNN 2.20 25 620 90.34
PV-RCNN 13.10 69 80 92.24
ComplexYolo 65.50 31 19 75.32
PointPillars 1.43 32 16 88.36
Ours 0.59 6 14 94.70
25. 26
Dynamic iterative refinement for efficient 3D hand pose
estimation, WACV 2022
Dynamic
refinements
to reduce size
and latency
for hand pose
estimation
Attention
map
Refiner
Iterative refinement
Backbone
FC
FC
Variance
maps
HAND
model
3D pose
2D pose
Pose predictor
Ground truth
Ours
Lightweight architecture:
applies recursively while
incorporating attention
and gating for dynamic
refinement
Eliminates the need for
precise hand detection
26. 27
27
Methods AUC (20-50) GFLOPs #Params
Z&B 0.948 78.2 -
Liu et al. 0.964 16.0 -
HAMR 0.982 8.0 -
Cai et al. 0.995 6.2 4.53M
Fan et al. 0.996 1.6 4.76M
Ours 0.997 1.3 1.68M
Methods AUC (0-50) GFLOPs #Params
Tekin et al. 0.653 13.62 14.31M
Fan et al. 0.731 1.37 5.76M
Ours 0.768 0.28 0.46M
Significant reduction in GFLOPs
while achieving better accuracy
AUC: Area under curve
Dynamic iterative refinement for efficient 3D hand pose estimation, WACV 2022
Our method also achieves the best average 3D error
9.76mm (SOTA) → 7.24mm (Ours)
Heatmaps for hand landmark
points from our model
27. 28
IRISformer: Dense Vision Transformers for Single-Image
Inverse Rendering in Indoor Scenes, CVPR 2022 – in
collaboration with Professor Manmohan Chandraker
World’s first
transformer-based
inverse rendering
for scene
understanding
Backbone
TA
Dense Multi Transformer
HeadA
TR
TD
TN
Encoder
T0
TR
TD
TN
Decoder
HeadR
HeadD
HeadN
Tokens
Albedo
Roughness
Depth
Normals
Encoder
Decoder-f
Decoder-
Decoder-
Head
Head
Headf
Renderer
Lighting
BRDFGeoNet
LigthNet
Tokens
Estimates physically-based
scene attributes from
an indoor image
• End-to-end trained pipeline for
room-layout, surface normal, albedo,
material, object, and lighting estimation
• Leads to better handling of global
interactions between scene components,
achieving better disambiguation of
shape, material, and lighting
• SOTA results on all 3D perception tasks
and enables high-quality AR applications
such as object insertion
28. 29
29
Ours
Ours
IRISformer: Dense Vision Transformers for Single-Image Inverse Rendering in Indoor Scenes, CVPR 2022 – in collaboration with Professor Manmohan Chandraker
Our model finds more accurate scene attributes compared to SOTA
Small insets are refined estimations obtained by further post-processing with bilateral solvers (BS).
29. 30
30
Focuses attention on:
1. Chairs and window
2. Highlighted regions over the image
3. Entire floor
4. The chair itself
Focuses attention on:
1. Neighboring shadowed areas of the wall
2. The entire wall
3. Potential light sources and occluders
4. Ambient environment
Attention Maps (Albedo)
Input images
IRISformer: Dense Vision Transformers for Single-Image Inverse Rendering in Indoor Scenes, CVPR 2022 – in collaboration with Professor Manmohan Chandraker
Transformer algorithm automatically learns attention maps to
determine the important areas of the image
30. 31
31
Our method correctly estimates lighting to realistically insert objects
Barron et al. 2013 Gardner et al. 2017 Li et al. 2021 Ours
IRISformer: Dense Vision Transformers for Single-Image Inverse Rendering in Indoor Scenes, CVPR 2022 – in collaboration with Professor Manmohan Chandraker
32. 33
Coming soon
3D
perception
innovations
Neuro-SLAM
• Greatly facilitates XR, autonomous vehicles, and robotic vision
• Our goal is to create 3D maps in real time on the device
3D imitation learning
• Learning complex dexterous skills in 3D spaces from human videos
• Our goal is to enable such learning in real-time
3D scene understanding in RF (Wi-Fi/5G)
• Our goal is to achieve floorplan estimation, human detection,
tracking, and pose using only RF signals
Neural radiance fields (NeRF)
• Single object/scene → single model: virtual images from any viewpoint
• Our goal is to run NeRF in real-time on mobile platforms
33. 34
34
We are conducting
leading research to
enable 3D perception
Thanks to our full-stack
AI research, we are first to
demonstrate 3D perception
proof-of-concepts on
edge devices
We are solving system
and feasibility challenges
to move from research
to commercialization