Artificial Intelligence (AI), specifically deep learning, is revolutionizing industries, products, and core capabilities by delivering dramatically enhanced experiences. However, the deep neural networks of today use too much memory, compute, and energy. To make AI truly ubiquitous, it needs to run on the end device within tight power and thermal budgets. Advancements in multiple areas are necessary to improve AI model efficiency, including quantization, compression, compilation, and neural architecture search (NAS). In this presentation, we’ll discuss:
- Qualcomm AI Research’s latest model efficiency research
- Our new NAS research to optimize neural networks more easily for on-device efficiency
- How the AI community can take advantage of this research though our open-source projects, such as the AI Model Efficiency Toolkit (AIMET) and AIMET Model Zoo
AI model efficiency is crucial for making AI ubiquitous, leading to smarter devices and enhanced lives. Besides the performance benefit, quantized neural networks also increase power efficiency for two reasons: reduced memory access costs and increased compute efficiency.
The quantization work done by the Qualcomm AI Research team is crucial in implementing machine learning algorithms on low-power edge devices. In network quantization, we focus on both pushing the state-of-the-art (SOTA) in compression and making quantized inference as easy to access as possible. For example, our SOTA work on oscillations in quantization-aware training that push the boundaries of what is possible with INT4 quantization. Furthermore, for ease of deployment, the integer formats such as INT16 and INT8 give comparable performance to floating point, i.e., FP16 and FP8, but have significantly better performance-per-watt performance. Researchers and developers can make use of this quantization research to successfully optimize and deploy their models across devices with open-sourced tools like AI Model Efficiency Toolkit (AIMET).
Presenters: Tijmen Blankevoort and Chirag Patel
Qualcomm Webinar: Solving Unsolvable Combinatorial Problems with AIQualcomm Research
How do you find the best solution when faced with many choices? Combinatorial optimization is a field of mathematics that seeks to find the most optimal solutions for complex problems involving multiple variables. There are numerous business verticals that can benefit from combinatorial optimization, whether transport, supply chain, or the mobile industry.
More recently, we’ve seen gains from AI for combinatorial optimization, leading to scalability of the method, as well as significant reductions in cost. This method replaces the manual tuning of traditional heuristic approaches with an AI agent that provides a fast metric estimation.
In this presentation you will find out:
Why AI is crucial in combinatorial optimization
How it can be applied to two use cases: improving chip design and hardware-specific compilers
The state-of-the-art results achieved by Qualcomm AI Research
5G + AI: The Ingredients For Next Generation Wireless InnovationQualcomm Research
5G and AI are two of the most disruptive technologies the world has seen in decades. While each is individually revolutionizing industries and enabling new experiences, the combination of both 5G and AI is going to be truly transformative. Applying AI not only to the 5G network but also the device will lead to more efficient wireless communications, longer battery life and enhanced user experiences. The low latency and high capacity of 5G will also allow AI processing to be distributed amongst the device, edge cloud and central cloud, enabling flexible system solutions for a variety of use cases. At Qualcomm Technologies, we are not only working on cutting-edge research for 5G and AI, but we are also exploring their synergies to realize our vision of the future. View this presentation to learn how AI is making 5G better -- in the network and on the device, why on-device AI processing is essential, and how 5G is empowering distributed learning over wireless.
AI firsts: Leading from research to proof-of-conceptQualcomm Research
AI has made tremendous progress over the past decade, with many advancements coming from fundamental research from many decades ago. Accelerating the pipeline from research to commercialization has been daunting since scaling technologies in the real world faces many challenges beyond the theoretical work done in the lab. Qualcomm AI Research has taken on the task of not only generating novel AI research but also being first to demonstrate proof-of-concepts on commercial devices, enabling technology to scale in the real world. This presentation covers:
The challenges of deploying cutting-edge research on real-world mobile devices
How Qualcomm AI Research is solving system and feasibility challenges with full-stack optimizations to quickly move from research to commercialization
Examples where Qualcomm AI Research has had industrial or academic firsts
Artificial Intelligence (AI), specifically deep learning, is revolutionizing industries, products, and core capabilities by delivering dramatically enhanced experiences. However, the deep neural networks of today use too much memory, compute, and energy. Plus, to make AI truly ubiquitous, networks need to run on the end device within a tight power and thermal budget. One approach to help address these issues is quantization, which attempts to reduce the number of bits used for weight parameters and activation calculations without sacrificing model accuracy. This presentation covers: why quantization is important, existing quantization challenges, Qualcomm AI Research's existing quantization research, and how developers and researchers can take advantage of quantization on Qualcomm Snapdragon.
Data compression has increased by leaps and bounds over the years due to technical innovation, enabling the proliferation of streamed digital multimedia and voice over IP. For example, a regular cadence of technical advancement in video codecs has led to massive reduction in file size – in fact, up to a 1000x reduction in file size when comparing a raw video file to a VVC encoded file. However, with the rise of machine learning techniques and diverse data types to compress, AI may be a compelling tool for next-generation compression, offering a variety of benefits over traditional techniques. In this presentation we discuss:
- Why the demand for improved data compression is growing
- Why AI is a compelling tool for compression in general
- Qualcomm AI Research’s latest AI voice and video codec research
- Our future AI codec research work and challenges
AI model efficiency is crucial for making AI ubiquitous, leading to smarter devices and enhanced lives. Besides the performance benefit, quantized neural networks also increase power efficiency for two reasons: reduced memory access costs and increased compute efficiency.
The quantization work done by the Qualcomm AI Research team is crucial in implementing machine learning algorithms on low-power edge devices. In network quantization, we focus on both pushing the state-of-the-art (SOTA) in compression and making quantized inference as easy to access as possible. For example, our SOTA work on oscillations in quantization-aware training that push the boundaries of what is possible with INT4 quantization. Furthermore, for ease of deployment, the integer formats such as INT16 and INT8 give comparable performance to floating point, i.e., FP16 and FP8, but have significantly better performance-per-watt performance. Researchers and developers can make use of this quantization research to successfully optimize and deploy their models across devices with open-sourced tools like AI Model Efficiency Toolkit (AIMET).
Presenters: Tijmen Blankevoort and Chirag Patel
Qualcomm Webinar: Solving Unsolvable Combinatorial Problems with AIQualcomm Research
How do you find the best solution when faced with many choices? Combinatorial optimization is a field of mathematics that seeks to find the most optimal solutions for complex problems involving multiple variables. There are numerous business verticals that can benefit from combinatorial optimization, whether transport, supply chain, or the mobile industry.
More recently, we’ve seen gains from AI for combinatorial optimization, leading to scalability of the method, as well as significant reductions in cost. This method replaces the manual tuning of traditional heuristic approaches with an AI agent that provides a fast metric estimation.
In this presentation you will find out:
Why AI is crucial in combinatorial optimization
How it can be applied to two use cases: improving chip design and hardware-specific compilers
The state-of-the-art results achieved by Qualcomm AI Research
5G + AI: The Ingredients For Next Generation Wireless InnovationQualcomm Research
5G and AI are two of the most disruptive technologies the world has seen in decades. While each is individually revolutionizing industries and enabling new experiences, the combination of both 5G and AI is going to be truly transformative. Applying AI not only to the 5G network but also the device will lead to more efficient wireless communications, longer battery life and enhanced user experiences. The low latency and high capacity of 5G will also allow AI processing to be distributed amongst the device, edge cloud and central cloud, enabling flexible system solutions for a variety of use cases. At Qualcomm Technologies, we are not only working on cutting-edge research for 5G and AI, but we are also exploring their synergies to realize our vision of the future. View this presentation to learn how AI is making 5G better -- in the network and on the device, why on-device AI processing is essential, and how 5G is empowering distributed learning over wireless.
AI firsts: Leading from research to proof-of-conceptQualcomm Research
AI has made tremendous progress over the past decade, with many advancements coming from fundamental research from many decades ago. Accelerating the pipeline from research to commercialization has been daunting since scaling technologies in the real world faces many challenges beyond the theoretical work done in the lab. Qualcomm AI Research has taken on the task of not only generating novel AI research but also being first to demonstrate proof-of-concepts on commercial devices, enabling technology to scale in the real world. This presentation covers:
The challenges of deploying cutting-edge research on real-world mobile devices
How Qualcomm AI Research is solving system and feasibility challenges with full-stack optimizations to quickly move from research to commercialization
Examples where Qualcomm AI Research has had industrial or academic firsts
Artificial Intelligence (AI), specifically deep learning, is revolutionizing industries, products, and core capabilities by delivering dramatically enhanced experiences. However, the deep neural networks of today use too much memory, compute, and energy. Plus, to make AI truly ubiquitous, networks need to run on the end device within a tight power and thermal budget. One approach to help address these issues is quantization, which attempts to reduce the number of bits used for weight parameters and activation calculations without sacrificing model accuracy. This presentation covers: why quantization is important, existing quantization challenges, Qualcomm AI Research's existing quantization research, and how developers and researchers can take advantage of quantization on Qualcomm Snapdragon.
Data compression has increased by leaps and bounds over the years due to technical innovation, enabling the proliferation of streamed digital multimedia and voice over IP. For example, a regular cadence of technical advancement in video codecs has led to massive reduction in file size – in fact, up to a 1000x reduction in file size when comparing a raw video file to a VVC encoded file. However, with the rise of machine learning techniques and diverse data types to compress, AI may be a compelling tool for next-generation compression, offering a variety of benefits over traditional techniques. In this presentation we discuss:
- Why the demand for improved data compression is growing
- Why AI is a compelling tool for compression in general
- Qualcomm AI Research’s latest AI voice and video codec research
- Our future AI codec research work and challenges
This presentation outlines the synergistic nature of 5G and AI -- two disruptive areas of innovations that can change the world. It illustrates the benefits of adopting AI for the advancements of 5G, as well as showcases the latest progress made by Qualcomm Technologies, Inc.
This presentation is highlighting the architecture of 5G network and the need for a very efficient Management and Orchestration environnement, very automated and autonomous that monitors the network and services and trigger actions to optimize resources and meet quality of service.
Bringing AI research to wireless communication and sensingQualcomm Research
AI for wireless is already here, with applications in areas such as mobility management, sensing and localization, smart signaling and interference management. Recently, Qualcomm Technologies has prototyped the AI-enabled air interface and launched the Qualcomm 5G AI Suite. These developments are possible thanks to expertise in both wireless and machine learning from over a decade of foundational research in these complementing fields.
Our approach brings together the modeling flexibility and computational efficiency of machine learning and the out-of-domain generalization and interpretability of wireless domain expertise.
In this webinar, Qualcomm AI Research presents an overview of state-of-the-art research at the intersection of the two fields and offers a glimpse into the future of the wireless industry.
Qualcomm AI Research is an initiative of Qualcomm Technologies, Inc.
Speakers:
Arash Behboodi, Machine Learning Research Scientist (Senior Staff Engineer/Manager), Qualcomm AI Research Daniel Dijkman, Machine Learning Research Scientist (Principal Engineer), Qualcomm AI Research
This presentation covers an industry perspective and a roadmap towards 5G with open and democratized interfaces. It covers examples of open reference platforms and how open source communities can complement standard bodies such as 3GPP and IEEE. It characterizes RAN and user and control plane core micro services and discusses opportunities for embedded network telemetry for emerging machine learning applications.
Speaker: Tom Tofigh, Principal Member of Technical Staff (Architect) at AT&T
Ericsson brings new updates to its 5G platform. Introducing 5G network services to support operators from preparation to 5G launch. Ericsson 5G services roadmap spans across three distinct phases, Prepare, Mobilize and Launch. Through our service offerings, Operators can now evolve their 4G network and smoothly start introducing 5G, reaching new heights on their journey to 5G.
Experiences are everything and Juniper knows this. From when a user engages with an app on their smartphone to when a workload is generated in the cloud to pick up the request, we know that every point of contact along the way impacts the user’s experience, from client to cloud. Learn more about what Juniper has recently announced in this SlideShare!
3D perception is crucial for understanding the real world. It offers many benefits and new capabilities over 2D across diverse applications, from XR and autonomous driving to IOT, camera, and mobile. 3D perception with machine learning is creating the new state of the art (SOTA) in areas, such as depth estimation, object detection, and neural scene representation. Making these SOTA neural networks feasible for real-world deployment on mobile devices constrained by power, thermal, and performance has been a challenge. Qualcomm AI Research has developed not only novel AI techniques for 3D perception but also full-stack AI optimizations to enable real-world deployments and energy-efficient solutions. This presentation explores the latest research that is enabling efficient 3D perception while maintaining neural network model accuracy. You’ll learn about:
- The advantages of 3D perception over 2D and the need for 3D perception across applications
- Advancements in 3D perception research by Qualcomm AI Research
- Our future 3D perception research directions
Next IIoT wave: embedded digital twin for manufacturing IRS srl
Next IIoT wave will be a population of digital twin. A digital twin is a real-time digital replica of a physical device. Developing an embedded digital twin allows superior device diagnostic and failure anticipation. Discover how to to implement an embedded digital twin using real-time monitoring, physical models, and machine learning
This presentation outlines the synergistic nature of 5G and AI -- two disruptive areas of innovations that can change the world. It illustrates the benefits of adopting AI for the advancements of 5G, as well as showcases the latest progress made by Qualcomm Technologies, Inc.
This presentation is highlighting the architecture of 5G network and the need for a very efficient Management and Orchestration environnement, very automated and autonomous that monitors the network and services and trigger actions to optimize resources and meet quality of service.
Bringing AI research to wireless communication and sensingQualcomm Research
AI for wireless is already here, with applications in areas such as mobility management, sensing and localization, smart signaling and interference management. Recently, Qualcomm Technologies has prototyped the AI-enabled air interface and launched the Qualcomm 5G AI Suite. These developments are possible thanks to expertise in both wireless and machine learning from over a decade of foundational research in these complementing fields.
Our approach brings together the modeling flexibility and computational efficiency of machine learning and the out-of-domain generalization and interpretability of wireless domain expertise.
In this webinar, Qualcomm AI Research presents an overview of state-of-the-art research at the intersection of the two fields and offers a glimpse into the future of the wireless industry.
Qualcomm AI Research is an initiative of Qualcomm Technologies, Inc.
Speakers:
Arash Behboodi, Machine Learning Research Scientist (Senior Staff Engineer/Manager), Qualcomm AI Research Daniel Dijkman, Machine Learning Research Scientist (Principal Engineer), Qualcomm AI Research
This presentation covers an industry perspective and a roadmap towards 5G with open and democratized interfaces. It covers examples of open reference platforms and how open source communities can complement standard bodies such as 3GPP and IEEE. It characterizes RAN and user and control plane core micro services and discusses opportunities for embedded network telemetry for emerging machine learning applications.
Speaker: Tom Tofigh, Principal Member of Technical Staff (Architect) at AT&T
Ericsson brings new updates to its 5G platform. Introducing 5G network services to support operators from preparation to 5G launch. Ericsson 5G services roadmap spans across three distinct phases, Prepare, Mobilize and Launch. Through our service offerings, Operators can now evolve their 4G network and smoothly start introducing 5G, reaching new heights on their journey to 5G.
Experiences are everything and Juniper knows this. From when a user engages with an app on their smartphone to when a workload is generated in the cloud to pick up the request, we know that every point of contact along the way impacts the user’s experience, from client to cloud. Learn more about what Juniper has recently announced in this SlideShare!
3D perception is crucial for understanding the real world. It offers many benefits and new capabilities over 2D across diverse applications, from XR and autonomous driving to IOT, camera, and mobile. 3D perception with machine learning is creating the new state of the art (SOTA) in areas, such as depth estimation, object detection, and neural scene representation. Making these SOTA neural networks feasible for real-world deployment on mobile devices constrained by power, thermal, and performance has been a challenge. Qualcomm AI Research has developed not only novel AI techniques for 3D perception but also full-stack AI optimizations to enable real-world deployments and energy-efficient solutions. This presentation explores the latest research that is enabling efficient 3D perception while maintaining neural network model accuracy. You’ll learn about:
- The advantages of 3D perception over 2D and the need for 3D perception across applications
- Advancements in 3D perception research by Qualcomm AI Research
- Our future 3D perception research directions
Next IIoT wave: embedded digital twin for manufacturing IRS srl
Next IIoT wave will be a population of digital twin. A digital twin is a real-time digital replica of a physical device. Developing an embedded digital twin allows superior device diagnostic and failure anticipation. Discover how to to implement an embedded digital twin using real-time monitoring, physical models, and machine learning
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2023/06/accelerating-newer-ml-models-using-the-qualcomm-ai-stack-a-presentation-from-qualcomm/
Vinesh Sukumar, Senior Director and Head of AI/ML Product Management at Qualcomm Technologies, presents the “Accelerating Newer ML Models Using the Qualcomm AI Stack” tutorial at the May 2023 Embedded Vision Summit.
The Qualcomm AI Stack revolutionizes how Qualcomm thinks about AI software and provides the ultimate tool and user interface to enable ecosystem partners to create faster and smarter AI applications for all embedded form factors. Focusing on real user experience challenges centered around model deployment, Sakumar explains how the Snapdragon developer community leverages data types, quantization and neural architecture search—among others—to optimize complex AI architectures for emerging use cases.
Accelerating algorithmic and hardware advancements for power efficient on-dev...Qualcomm Research
Artificial Intelligence (AI), specifically deep learning, is revolutionizing industries, products, and core capabilities by delivering dramatically enhanced experiences. However, the deep neural networks of today are growing quickly in size and use too much memory, compute, and energy. Plus, to make AI truly ubiquitous, it needs to run on the end device within a tight power and thermal budget. One approach to address these issues is Bayesian deep learning. This presentation covers:
• Why AI algorithms and hardware need to be energy efficient
• How Bayesian deep learning is making neural networks more power efficient through model compression and quantization
• How we are doing fundamental research on AI algorithms and hardware to maximize power efficiency
Explore Deep Learning Architecture using Tensorflow 2.0 now! Part 2Tyrone Systems
For all that we're unable to attend or would like to recap our live webinar Deep Learning for Tensorflow Series part 2, we have all the information for you so would not miss out!
Artificial Intelligence (AI) is revolutionizing industries, products, and core capabilities by delivering dramatically enhanced experiences. However, the deep neural networks of today are growing quickly in size and use too much memory, compute, and energy. Plus, to make AI truly ubiquitous, it needs to run on the end device within a tight power and thermal budget.
Fundamental research, in AI as well as applying that research, is required to advance AI further and speed up adoption. In this presentation, learn how:
* Several research topics across the entire spectrum of AI, such as generalized CNNs and deep generative models
* AI model optimization research for power efficiency, including compression, quantization, and compilation
* Advances in AI research to make AI ubiquitous
For the full video of this presentation, please visit:
https://www.embedded-vision.com/platinum-members/embedded-vision-alliance/embedded-vision-training/videos/pages/sep-2019-alliance-vitf-facebook
For more information about embedded vision, please visit:
http://www.embedded-vision.com
Raghuraman Krishnamoorthi, Software Engineer at Facebook, delivers the presentation "Quantizing Deep Networks for Efficient Inference at the Edge" at the Embedded Vision Alliance's September 2019 Vision Industry and Technology Forum. Krishnamoorthi gives an overview of practical deep neural network quantization techniques and tools.
This talk was presented in Startup Master Class 2017 - http://aaiitkblr.org/smc/ 2017 @ Christ College Bangalore. Hosted by IIT Kanpur Alumni Association and co-presented by IIT KGP Alumni Association, IITACB, PanIIT, IIMA and IIMB alumni.
My co-presenter was Biswa Gourav Singh. And contributor was Navin Manaswi.
http://dataconomy.com/2017/04/history-neural-networks/ - timeline for neural networks
Novi Sad AI is the first AI community in Serbia with goal of democratizing knowledge of AI. On our first event we talked about Belief networks, Deep learning and many more.
Once-for-All: Train One Network and Specialize it for Efficient Deploymenttaeseon ryu
안녕하세요 딥러닝 논문읽기 모임 입니다! 오늘 소개 드릴 논문은 Once-for-All: Train One Network and Specialize it for Efficient Deployment 라는 제목의 논문입니다.
모델을 실제로 하드웨어에 Deploy하는 그 상황을 보고 있는데 이 페이퍼에서 꼽고 있는 가장 큰 문제는 실제로 트레인한 모델을 Deploy할 하드웨어 환경이 너무나도 많다는 문제가 하나 있습니다 모든 디바이스가 갖고 있는 리소스가 다르기 때문에 모든 하드웨어에 맞는 모델을 찾기가 사실상 불가능하다는 문제를 꼽고 있고요
각 하드웨어에 맞는 옵티멀한 네트워크 아키텍처가 모두 다른 상황에서 어떻게 해야 될건지에 대한 고민이 일반적 입니다. 이제 할 수 있는 접근중에 하나는 각 하드웨어에 맞게 옵티멀한 아키텍처를 모두 다 찾는 건데 그게 사실상 너무나 많은 계산량을 요구하기 때문에 불가능하다라는 문제를 갖고 있습니다 삼성 노트 10을 예로 한 어플리케이션의 requirement가 20m/s로 그 모델을 돌려야 된다는 요구사항이 있으면은 그 20m/s 안에 돌 수 있는 모델이 뭔지 accuracy가 뭔지 이걸 찾기 위해서는 파란색 점들을 모두 찾아야 되고 각 점이 이제 트레이닝 한번을 의미하게 됩니다 그래서 사실상 다 수의 트레이닝을 다 해야지만 그 중에 뭐가 최적인지 또 찾아야 합니다. 실제 Deploy해야 되는 시나리오가 늘어나면 이게 리니어하게 증가하기 때문에
각 하드웨어에 맞는 그런 옵티멀 네트워크를 찾는게 사실상 불가능합니다.
그래서 이제 OFA에서 제안하는 어프로치는 하나의 네트워크를 한번 트레이닝 하고 나면 다시 하드웨어에 맞게 트레이닝할 필요 없이 그냥 각 환경에 맞게 가져다 쓸 수 있는 서브네트워크를 쓰면 된다 이게 주로 메인으로 사용하고 있는 어프로치입니다.
오늘 논문 리뷰를 위해 펀디멘탈팀 김동현님이 자세한 리뷰를 도와주셨습니다 많은 관심 미리 감사드립니다!
Artificial Intelligence (AI) is revolutionizing industries, products, and core capabilities by delivering dramatically enhanced experiences. However, this is just the start of the AI revolution. The field of AI, especially deep learning, is still in its infancy with tremendous opportunity for exploration and improvement. For instance, deep neural networks of today are rapidly growing in size and use too much memory, compute, and energy. To make AI truly ubiquitous, it needs to run on the end device within a tight power and thermal budget. New approaches and fundamental research in AI, as well as applying that research, is required to advance machine learning further and speed up adoption. View this presentation to learn about select research topics that Qualcomm AI Research is investigating, including:
o AI model optimization research for power efficiency, including our latest quantization research
o Applied AI research, such as using deep learning for improved radar functionality
o Fundamental AI research, such as source compression and quantum AI
The Implementing AI: Running AI at the Edge, hosted by KTN and eFutures, is the second event of the Implementing AI webinar series.
To make products more intelligent, more responsive and to reduce the data generated, it is advantageous to run AI on the product itself, as opposed to in the cloud.
The focus of this webinar was the opportunities and challenges of moving the AI processing to “the Edge”. The webinar had four presentations from experts covering overviews of the opportunity, implementation techniques and case studies.
Find out more: https://ktn-uk.co.uk/news/just-launched-implementing-ai-webinar-series
AWS Summit Berlin 2013 - Big Data AnalyticsAWS Germany
Learn more about the tools, techniques and technologies for working productively with data at any scale. This session will introduce the family of data analytics tools on AWS which you can use to collect, compute and collaborate around data, from gigabytes to petabytes. We'll discuss Amazon Elastic MapReduce, Hadoop, structured and unstructured data, and the EC2 instance types which enable high performance analytics.
NERSC is the production high-performance computing (HPC) center for the United States Department of Energy (DOE) Office of Science. The center supports over 6,000 users in 600 projects, using a variety of applications in materials science, chemistry, biology, astrophysics, high energy physics, climate science, fusion science, and more.
NERSC deployed the Cori system on over 9,000 Intel® Xeon Phi™ processors. This session describes the optimization strategy for porting codes that target traditional manycore architectures to the processors. We also discuss highlights and lessons learned from the optimization process on 20 applications associated with the NERSC Exascale Science Application Program (NESAP).
Learn about Tensorflow for Deep Learning now! Part 1Tyrone Systems
In this comprehensive workshop, learn how to use TensorFlow, how to build data pipelines and implement a simple deep learning model using Tensorflow Keras. Enhance your knowledge and skills by have better understanding of Tensorflow with all the resources we have available for you!
Similar to Intelligence at scale through AI model efficiency (20)
Generative AI models, such as ChatGPT and Stable Diffusion, can create new and original content like text, images, video, audio, or other data from simple prompts, as well as handle complex dialogs and reason about problems with or without images. These models are disrupting traditional technologies, from search and content creation to automation and problem solving, and are fundamentally shaping the future user interface to computing devices. Generative AI can apply broadly across industries, providing significant enhancements for utility, productivity, and entertainment. As generative AI adoption grows at record-setting speeds and computing demands increase, on-device and hybrid processing are more important than ever. Just like traditional computing evolved from mainframes to today’s mix of cloud and edge devices, AI processing will be distributed between them for AI to scale and reach its full potential.
In this presentation you’ll learn about:
- Why on-device AI is key
- Full-stack AI optimizations to make on-device AI possible and efficient
- Advanced techniques like quantization, distillation, and speculative decoding
- How generative AI models can be run on device and examples of some running now
- Qualcomm Technologies’ role in scaling on-device generative AI
As generative AI adoption grows at record-setting speeds and computing demands increase, hybrid processing is more important than ever. But just like traditional computing evolved from mainframes and thin clients to today’s mix of cloud and edge devices, AI processing must be distributed between the cloud and devices for AI to scale and reach its full potential. In this talk you’ll learn:
• Why on-device AI is key
• Which generative AI models can run on device
• Why the future of AI is hybrid
• Qualcomm Technologies’ role in making hybrid AI a reality
- There is a rich roadmap of 5G technologies coming in the second half of the 5G decade with the 5G Advanced evolution
- 6G will be the future innovation platform for 2030 and beyond building on the 5G Advanced foundation
- 6G will be more than just a new radio design, expanding the role of AI, sensing and others in the connected intelligent edge
- Qualcomm is leading cutting-edge wireless research across six key technology vectors on the path to 6G
5G is going mainstream across the globe, and this is an exciting time to harness the low latency and high capacity of 5G to enable the metaverse. A distributed-compute architecture across device and cloud can enable rich extended reality (XR) user experiences. Virtual reality (VR) and mixed reality (MR) are ready for deployment in private networks, while augmented reality (AR) for wide area networks can be enabled in the near term with Wi-Fi powered AR glasses paired with a 5G-enabled phone. Device APIs enabling application adaptation is critical for good user experience. 5G standards are evolving to support the deployment of AR glasses at a large scale and setting the stage for 6G-era with the merging of the physical, digital, and virtual worlds. Techniques like perception-enhanced wireless offer significant potential to improve user experience. Qualcomm Technologies is enabling the XR industry with platforms, developer SDKs, and reference designs.
Check out this webinar to learn:
• How 5G and distributed-compute architectures enable the metaverse
• The latest results from our boundless XR 5G/6G testbed, including device APIs and perception-enhanced wireless
• 5G standards evolution for enhancing XR applications and the road to 6G
• How Qualcomm Technologies is enabling the industry with platforms, SDKs, and reference designs
How will sidelink bring a new level of 5G versatility.pdfQualcomm Research
Today, the 5G system mainly operates on a network-to-device communication model, exemplified by enhanced mobile broadband use cases where all data transmissions are between the network (i.e., base station) and devices (e.g., smartphone). However, to fully deliver on the original 5G vision of supporting diverse devices, services, and deployment scenarios, we need to expand the 5G topology further to reach new levels of performance and efficiency.
That is why sidelink communication was introduced in 3GPP standards, designed to facilitate direct communication between devices, independent of connectivity via the cellular infrastructure. Beyond automotive communication, it also benefits many other 5G use cases such as IoT, mobile broadband, and public safety.
5G is designed to serve an unprecedented range of capabilities with a single global standard. With enhanced mobile broadband (eMBB), massive IoT (mIoT), and mission-critical IoT, the three pillars of 5G represent extremes in performance and associated complexity. For IoT services, NB-IoT and eMTC devices prioritize low power consumption and the lowest complexity for wide-area deployments (LPWA), while enhanced ultra-reliable, low-latency communication (eURLLC), along with time-sensitive networking (TSN), delivers the most stringent use case requirements. But there exists an opportunity to more efficiently address a broad range of mid-tier applications with capabilities ranging between these extremes.
In 5G NR Release 17, 3GPP introduced a new tier of reduced capability (RedCap) devices, also known as NR-Light. It is a new device platform that bridges the capability and complexity gap between the extremes in 5G today with an optimized design for mid-tier use cases. With the recent standards completion, NR-Light is set to efficiently expand the 5G universe to connect new frontiers.
Download this presentation to learn:
• What NR-Light is and why it can herald the next wave of 5G expansion
• How NR-Light is accelerating the growth of the connected intelligent edge
• Why NR-Light is a suitable 5G migration path for mid-tier LTE devices
Realizing mission-critical industrial automation with 5GQualcomm Research
Manufacturers seeking better operational efficiencies, with reduced downtime and higher yield, are at the leading edge of the Industry 4.0 transformation. With mobile system components and reliable wireless connectivity between them, flexible manufacturing systems can be reconfigured quickly for new tasks, to troubleshoot issues, or in response to shifts in supply and demand.
There is a long history of R&D collaboration between Bosch Rexroth and Qualcomm Technologies for the effective application of these 5G capabilities to industrial automation use cases. At the Robert Bosch Elektronik GmbH factory in Salzgitter, Germany, this collaboration has reached new heights.
Download this deck to learn how:
• Qualcomm Technologies and Bosch Rexroth are collaborating to accelerate the Industry 4.0 transformation
• 5G technologies deliver key capabilities for mission-critical industrial automation
• Distributed control solutions can work effectively across 5G TSN networks
• A single 5G technology platform solves connectivity and positioning needs for flexible manufacturing
3GPP Release 17: Completing the first phase of 5G evolutionQualcomm Research
This presentation summarizes 5G NR Release 17 projects that was completed in March 2022. It further enhances 5G foundation and expands into new devices, use cases, verticals.
Setting off the 5G Advanced evolution with 3GPP Release 18Qualcomm Research
In December 2021, 3GPP has reached a consensus on the scope of 5G NR Release 18. This is a significant milestone marking the beginning of 5G Advanced — the second wave of wireless innovations that will fulfill the 5G vision. Release 18 will build on the solid foundation set by Releases 15, 16, and 17, and it sets the longer-term evolution direction of 5G and beyond. This release will encompass a wide range of new and enhancement projects, ranging from improved MIMO and application of AI/ML-enabled air interface to extended reality optimizations and broader IoT support.
Cellular networks have facilitated positioning in addition to voice or data communications from the beginning, since 2G, and we’ve since grown to rely on positioning technology to make our lives safer, simpler, more productive, and even fun. Cellular positioning complements other technologies to operate indoors and outdoors, including dense urban environments where tall buildings interfere with satellite positioning. It works whether we’re standing still, walking, or in a moving vehicle. With 5G, cellular positioning breaks new ground to bring robust precise positioning indoors and outdoors, to meet even the most demanding Industry 4.0 needs.
As we look to the future, the Connected Intelligent Edge will bring a new dimension of positional insight to a broad range of devices, improving wireless use cases still under development. We’re already charting the course to 5G Advanced and beyond by working on the evolution of cellular positioning technology to include RF sensing for situational awareness.
Download the deck to learn more.
The need for intelligent, personalized experiences powered by AI is ever-growing. Our devices are producing more and more data that could help improve our AI experiences. How do we learn and efficiently process all this data from edge devices while maintaining privacy? On-device learning rather than cloud training can address these challenges. In this presentation, we’ll discuss:
- Why on-device learning is crucial for providing intelligent, personalized experiences without sacrificing privacy
- Our latest research in on-device learning, including few-shot learning, continuous learning, and federated learning
- How we are solving system and feasibility challenges to move from research to commercialization
How to build high performance 5G networks with vRAN and O-RANQualcomm Research
5G networks are poised to deliver an unprecedented amount of data from a richer set of use cases than we have ever seen. This makes efficient networking in terms of scalability, cost, and power critical for the sustainable growth of 5G. Cloud technologies such as virtualization, containerization and orchestration are now powering a surge of innovation in virtualized radio access network (vRAN) infrastructure with modular hardware and software components, and standardized interfaces. While commercial off-the-shelf (COTS) hardware platforms provide the compute capacity for running vRAN software, hardware accelerators will also play a major role in offloading real-time and complex signal processing functions. Together, COTS platforms and hardware accelerators provide the foundation for building the intelligent 5G network and facilitate innovative new use cases with the intelligent wireless edge.
This presentation takes a look at the technology roadmap for 5G NR millimeter wave (mmWave). Including features such as integrated access and backhaul (IAB), enhancements in beam management, mobility, coverage, and more. For more information, please visit www.qualcomm.com/mmwave
Video data is abundant and being generated at ever increasing rates. Analyzing video with AI can provide valuable insights and capabilities for many applications ranging from autonomous driving and smart cameras to smartphones and extended reality. However, as video resolution and frame rates increase while AI video perception models become more complex, running these workloads in real time is becoming more challenging. This presentation explores the latest research that is enabling efficient video perception while maintaining neural network model accuracy. You’ll learn about:
- How video perception is crucial for understanding the world and making devices smarter
- The challenges of on-device real-time video perception at high resolution through AI
- Qualcomm AI Research’s latest research and techniques for efficient video perception
Checkout: https://www.qualcomm.com/AI
Enabling the rise of the smartphone: Chronicling the developmental history at...Qualcomm Research
Today’s smartphones are a marvel of modern technology — handheld devices with vast computing power, incredible multimedia and AI capabilities, and blazing fast data rates that support mobile browsing, social media interaction, and more. From humble beginnings as a cellphone focused purely on voice communication, the capability and functionality of modern smartphones have advanced tremendously. This presentation chronicles Qualcomm’s role in the rise of the smartphone from its initial beginnings to becoming the largest computing platform in the world. It includes:
- Key technology developments that led to today’s smartphones
- The role of Moore’s Law in driving new innovations and additional integration into mobile processors
- Qualcomm’s critical role in advancing the smartphone’s capabilities through groundbreaking innovations and key technology developments
This presentation provides an overview of important 5G innovations around new and enhanced use of spectrum. It also captures the current 5G spectrum status across the globe.
Transforming enterprise and industry with 5G private networksQualcomm Research
The 3GPP put the spotlight on industry expansion in July with 5G NR Release 16 and set the stage for enterprise and industry verticals to look at how to provide high-performance wireless connectivity with 5G private networks. With a variety of options for spectrum, different network architectures, a rich feature set to meet the demanding needs of the industrial Internet of Things (IIoT), and the privacy and security required for business assurance, 5G private networks are poised to transform enterprise and industry.
Watch the webinar at: https://pages.questexnetwork.com/Webinar-Qualcomm-Registration-101520.html?source=Qualcomm
Today, we take it for granted that our mobile devices and applications just work out of the box — smartphones can roam virtually anywhere in the world, laptops can seamlessly connect to any Wi-Fi access point & Bluetooth peripheral, and the videos recorded on one device can be played back perfectly on any other device.
The magic behind all this? Technology standards. Not only do they power a wide range of systems and devices but also bring many benefits to the broader technology ecosystem. At Qualcomm Technologies, we are leading the standardization of many key technologies that will move the world forward.
Download this presentation to learn:
- The value of technology standards, specifically in the areas of cellular, Wi-Fi, Bluetooth, and video codecs
- Why standardized technologies are essential for industry growth and ecosystem development
- How standard bodies operate in a complex, challenging, and ever changing environment
- How Qualcomm is driving innovation in different technology standards
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
"Impact of front-end architecture on development cost", Viktor TurskyiFwdays
I have heard many times that architecture is not important for the front-end. Also, many times I have seen how developers implement features on the front-end just following the standard rules for a framework and think that this is enough to successfully launch the project, and then the project fails. How to prevent this and what approach to choose? I have launched dozens of complex projects and during the talk we will analyze which approaches have worked for me and which have not.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Let's dive deeper into the world of ODC! Ricardo Alves (OutSystems) will join us to tell all about the new Data Fabric. After that, Sezen de Bruijn (OutSystems) will get into the details on how to best design a sturdy architecture within ODC.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
Intelligence at scale through AI model efficiency
1. April 6, 2021 @QCOMResearch
Qualcomm Technologies, Inc.
Intelligence at
scale through
AI model efficiency
2. 2
• Why efficient machine learning
is necessary for AI to proliferate
• Our latest research to make
AI models more efficient
• Our open-source projects
to scale efficient AI
Agenda
3. 3
3
Video monitoring
Extended reality Smart cities
Smart factories
Autonomous vehicles
Video conferencing
Smart homes
Smartphone
AI is being used all around us
increasing productivity, enhancing collaboration,
and transforming industries
AI video analysis is on the rise
Trend toward more cameras, higher resolution,
and increased frame rate across devices
4. 4
4
Source: Welling
Will we have reached the capacity of the human brain?
Energy efficiency of a brain is 100x better than current hardware
2025
Weight
parameter
count
1940 1950 1960 1970 1980 1990 2000 2010 2020 2030
1943: First NN (+/- N=10)
1988: NetTalk
(+/- N=20K)
2009: Hinton’s Deep
Belief Net (+/- N=10M)
2013: Google/Y!
(N=+/- 1B)
2025:
N = 100T = 1014
2017: Very large neural
networks (N=137B)
1012
1010
108
106
1014
104
102
100
Deep neural networks
are energy hungry
and growing fast
AI is being powered by the explosive
growth of deep neural networks
2021: Extremely large
neural networks (N =1.6T)
5. 5
5
Power and thermal
efficiency are essential
for on-device AI
The challenge of
AI workloads
Constrained mobile
environment
Very compute
intensive
Large,
complicated neural
network models
Complex
concurrencies
Always-on
Real-time
Must be thermally
efficient for sleek,
ultra-light designs
Storage/memory
bandwidth limitations
Requires long battery
life for all-day use
6. 6
Holistic
model efficiency
research
Multiple axes to shrink
AI models and efficiently
run them on hardware
Quantization
Learning to reduce
bit-precision while keeping
desired accuracy
Compression
Learning to prune
model while keeping
desired accuracy
Compilation
Learning to compile
AI models for efficient
hardware execution
Neural
architecture
search
Learning to design smaller
neural networks that are on par
or outperform hand-designed
architectures on real
hardware
7. 7
1: FP32 model compared to quantized model
Leading
research to
efficiently
quantize
AI models
Promising results show that
low-precision integer inference
can become widespread
Virtually the same accuracy
between a FP32 and quantized
AI model through:
• Automated, data free,
post-training methods
• Automated training-based
mixed-precision method
Significant performance per watt
improvements through quantization
Automated reduction in precision
of weights and activations while
maintaining accuracy
Models trained at
high precision
32-bit floating point
3452.3194
8-bit Integer
255
Increase in performance
per watt from savings in
memory and compute1
Inference at
lower precision
16-bit Integer
3452
01010101
Increase in performance
per watt from savings in
memory and compute1
up to
4X
4-bit Integer
15
Increase in performance
per watt from savings in
memory and compute1
01010101
up to
16X
up to
64X
01010101
0101
01010101 01010101 01010101 01010101
8. 8
8
Data-free
quantization
Created an automated method
that addresses bias and
imbalance in weight ranges:
No training
Data free
How can we make
quantization as simple
as possible?
Pushing the
limits of what’s
possible with
quantization
AdaRound
Created an automated
method for finding the
best rounding choice:
No training
Minimal unlabeled data
Is rounding to the nearest
value the best approach
for quantization?
SOTA 8-bit results
Making 8-bit weight
quantization ubiquitous
<1%
Accuracy drop for
MobileNet V2
against FP32 model
Data-Free Quantization Through Weight Equalization and
Bias Correction (Nagel, van Baalen, et al., ICCV 2019)
SOTA: State-of-the-art
Making 4-bit weight quantization
ubiquitous
<2.5%
Accuracy drop for
MobileNet V2
against FP32 model
Up or Down? Adaptive Rounding for Post-Training
Quantization (Nagel, Amjad, et al., ICML 2020)
Bayesian bits
Created a novel method
to learn mixed-precision
quantization:
Training required
Training data required
Jointly learns bit-width
precision and pruning
Can we quantize layers to
different bit widths based
on precision sensitivity?
SOTA mixed-precision results
Automating mixed-precision
quantization and enabling the tradeoff
between accuracy and kernel bit-width
<1%
Accuracy drop for
MobileNet V2 against FP32
model for mixed precision
model with computational
complexity equivalent to a
4-bit weight model
Bayesian Bits: Unifying Quantization and Pruning
van Baalen, Louizos, et al., NeurIPS 2020)
8
8
SOTA 4-bit weight results
9. 9
9
Optimizing and deploying state-of-the-art AI models
for diverse scenarios at scale is challenging
Neural network
complexity
Many state-of-the-art
neural network
solutions are large,
complex, and do
not run efficiently
on target hardware
Neural network
diversity
Device
diversity
Cost
For different
tasks and use
case cases,
many different
neural networks
are required
Deploying neural
networks to many
different devices
with different
configurations and
changing software
is required
Compute and
engineering
resources for
training plus
evaluation are
too costly and
time consuming
10. 10
10
NAS
Neural
Architecture
Search
An automated way to learn
a network topology that can
achieve the best performance
on a certain task
Search
space
Set of operations
and how they
can be connected
to form valid
network
architectures
Search
algorithm
Method for
sampling a
population of
good network
architecture
candidates
Evaluation
strategy
Method to
estimate the
performance
of sampled
network
architectures
11. 11
High cost
Brute force search is expensive
>40,000 epochs per platform
Lack diverse search
Hard to search in diverse spaces, with different
block-types, attention, and activations
Repeated training phase for every new scenario
Do not scale
Repeated training phase for every new device
>40,000 epochs per platform
Unreliable hardware models
Requires differentiable cost-functions
Repeated training phase for every new device
Existing NAS
solutions do not
address all the
challenges
12. 12
Introducing new AI research
Efficient NAS with hardware-aware optimization
A scalable method that finds pareto-optimal
network architectures in terms of accuracy and
latency for any hardware platform at low cost
Starts from an oversized pretrained
reference architecture
Distilling Optimal Neural
Network Architectures
DONNA
DONNA
Oversized pretrained
reference architecture
Set of Pareto-optimal
network architectures
Low cost
Low start-up cost of 1000-4000 epochs,
equivalent to training 2-10 networks from scratch
Diverse search to find
the best models
Supports diverse spaces with different cell-types,
attention, and activation functions (ReLU, Swish, etc.)
Scalable
Scales to many hardware devices
at minimal cost
Reliable hardware
measurements
Uses direct hardware measurements instead
of a potentially inaccurate hardware model
Distilling Optimal Neural Networks: Rapid Search in Diverse Spaces (Moons, Bert, et al., arXiv 2020)
13. 13
Varying parameters:
• Kernel Size
• Expansion Factors
• Network depth
• Network width
• Attention/activation
• Different efficient
layer types
DONNA 4-step process Objective: Build accuracy model of search space once, then deploy to many scenarios
Define reference
and search
space once
A
13
Define backbone:
• Fixed channels
• Head and Stem
1 2 3 4 5
14. 14
Define reference architecture and search-space once
A diverse search space is essential for finding optimal architectures with higher accuracy
Select reference
architecture
The largest model
in the search-space
Chop the NN
into blocks
Fix the STEM, HEAD,
# blocks, strides,
# channels at block-edge
Choose search space
Diverse factorized
hierarchical search space,
including variable kernel-
size, expansion-rate, depth,
# channels, cell-type,
activation, attention
STEM 1, s=2 2, s=2 3, s=2 4, s=1 5, s=2 HEAD
ch=32 ch=64 ch=96 ch=128 ch=196 ch=256
Conv
1x1
FC
Avg
ch=1536
Conv
3x3s2
DW
Conv
ch=32
Kernel:
Expand:
Depth:
Attention:
3,5,7
2,3,4,6
1,2,3,4
SE, no SE
Activation:
Cell type:
Width scale:
ReLU/Swish
grouped, DW, …
0.5x, 1.0x
Choose diverse search space
Ch: channel; SE: Squeeze-and-Excitation
15. 15
Varying parameters:
• Kernel Size
• Expansion Factors
• Network depth
• Network width
• Attention/activation
• Different efficient
layer types
Approximate ideal projections
of a reference model through KD
1
1
MSE
2
2
MSE
3
3
MSE
4
4
MSE
5
5
MSE
Use quality of blockwise approximations
to build accuracy model
MSE
4
MSE
1
MSE
5
MSE
2
MSE
3
Define backbone:
• Fixed channels
• Head and Stem
1 2 3 4 5
DONNA 4-step process Objective: Build accuracy model of search space once, then deploy to many scenarios
15
Build accuracy model via
Knowledge Distillation
(KD) once
B
Define reference
and search
space once
A
16. 16
Build accuracy predictor via BKD once
Low-cost hardware-agnostic training phase
Block library
Pretrain all blocks in search-
space through blockwise
knowledge distillation
Fast block training
Trivial parallelized training
Broad search space
Block
pretrained
weights
Block
quality
metrics
Finetuned
architectures
Architecture library
Quickly finetune a
representative set
of architectures
Finetune sampled networks
Fast network training
Only 20-30 NN required
Accuracy predictor
Fit linear
regression
model
Regularized Ridge Regression
Accurate predictions
BKD: blockwise knowledge distillation
17. 17
Varying parameters:
• Kernel Size
• Expansion Factors
• Network depth
• Network width
• Attention/activation
• Different efficient
layer types
Define backbone:
• Fixed channels
• Head and Stem
1 2 3 4 5
A
Approximate ideal projections
of a reference model through KD
1
MSE
2
MSE
3
MSE
4
MSE
5
MSE
Use quality of blockwise approximations
to build accuracy model
MSE
4
MSE
1
MSE
5
MSE
2
MSE
3
different compiler versions,
different image sizes
HW latency
Predicted
accuracy
Scenario-
specific
search
DONNA 4-step process Objective: Build accuracy model of search space once, then deploy to many scenarios
17
Build accuracy model via
Knowledge Distillation
(KD) once
B Evolutionary
search in 24h
C
Define reference
and search
space once
1 2 3 4 5
Qualcomm Snapdragon is a product of Qualcomm Technologies, Inc. and/or its subsidiaries.
18. 18
Evolutionary search with real hardware measurements
Scenario-specific search allows users to select optimal architectures for real-life deployments
Quick turnaround time
Results in +/- 1 day using one measurement device
NSGA: Non-dominated Sorting Genetic Algorithm
Accurate scenario-specific search
Captures all intricacies of the hardware platform
and software — e.g. run-time version or devices
NSGA-II
sampling
algorithm
Target HW
Task
acc predictor
Predicted task accuracy
Measured latency on device
End-to-end
model
19. 19
different compiler versions,
different image sizes
HW latency
Predicted
accuracy
Scenario-
specific
search
MSE
4
MSE
1
MSE
5
MSE
2
MSE
3
4
1 5
2 3
Use KD-initialized
blocks from step B
to finetune any
network in the
search space in
15-50 epochs
instead of 450
DONNA 4-step process Objective: Build accuracy model of search space once, then deploy to many scenarios
19
Evolutionary
search in 24h
C Sample and
finetune
D
Varying parameters:
• Kernel Size
• Expansion Factors
• Network depth
• Network width
• Attention/activation
• Different efficient
layer types
Define backbone:
• Fixed channels
• Head and Stem
1 2 3 4 5
A
Approximate ideal projections
of a reference model through KD
1
MSE
2
MSE
3
MSE
4
MSE
5
MSE
Use quality of blockwise approximations
to build accuracy model
MSE
4
MSE
1
MSE
5
MSE
2
MSE
3
Build accuracy model via
Knowledge Distillation
(KD) once
B
Define reference
and search
space once
1 2 3 4 5
20. 20
Quickly finetune predicted Pareto-optimal architectures
Finetune to reach full accuracy and complete hardware-aware optimization for on-device AI deployments
Soft distillation on teacher logits
Block
pretrained
weights
4
1 5
2 3
4
1 5
2 3
CE
Soft
CE
Ground-truth
labels
BKD-reference network
Predicted
accuracy
HW latency
Confirmed
accuracy
HW latency
21. 21
Top-1
val
accuracy
[%]
Qualcomm® Adreno™ 660 GPU in the Snapdragon 888 running on the Samsung Galaxy S21. 2: Qualcomm® Hexagon™ 780 Processor in the Snapdragon 888 running on the Samsung Galaxy S21.
Qualcomm Adreno and Qualcomm Hexagon are products of Qualcomm Technologies, Inc. and/or its subsidiaries.
DONNA finds state-of-the-art
networks for on-device scenarios
Quickly optimize and make tradeoffs in model accuracy with respect
to the deployment conditions that matter
# of Parameters [M]
FLOPS
FPS
Desktop GPU latency
FPS
Mobile SoC latency1
(Adreno 660 GPU)
FPS
Mobile SoC latency2
(Hexagon 780 Processor)
20%
faster at similar
accuracy
20%
faster at similar
accuracy
224x224 images 224x224 images
224x224 images 672x672 images
20%
faster at similar
accuracy
22. 22
22
DONNA provides MnasNet-level diversity at 100x lower cost
*Training 1 model from scratch = 450 epochs
DONNA
efficiently
finds optimal
models over
diverse
scenarios
Cost of training
is a handful of
architectures*
Method Granularity Macro-diversity
Search-cost
1 scenario
[epochs]
Cost / scenario
4 scenarios
[epochs]
Cost / scenario
∞ scenarios
[epochs]
OFA Layer-level Fixed 1200+10×[25 — 75] 550 — 1050 250 — 750
DNA Layer-level Fixed 770+10×450 4700 4500
MNasNet Block-level Variable 40000+10×450 44500 44500
DONNA Block-level Variable 4000+10×50 1500 500
Good OK Not good
23. 23
DONNA finds
state-of-the-art
networks for
on-device
scenarios
Quickly optimize and
make tradeoffs in model
accuracy with respect
to the deployment
conditions that matter
# Multiply Accumulate Operations [FLOPS]
ResNet-50
Mobile models
Predicted
top-1
accuracy
VIT-B
DEIT-B
Object
detection
Vision
transformers
DONNA applies directly to downstream
tasks and non-CNN neural architectures
without conceptual code changes
VAL
mAP
(%)
24. 24
Execute predictor-driven
evolutionary search
on-device
Define a search-space
of smaller, faster
network architectures
Roughly equal to training
2-10 nets from scratch
Create an accuracy
predictor for all
network architectures
User perspective for DONNA
Build accuracy model of search space once, then deploy to many scenarios
Oversized pretrained
reference architecture
IN
DONNA
A B C
Deploy best models @ 3ms, 6ms, 9ms, … on different chips
Set of Pareto-optimal
network architectures
OUT
1 day /
use-case
Finetune best searched
models, 9-30x faster vs
regular training
D
50 GPU-hrs /
net
25. 25
25
DONNA
Conclusions
DONNA shrinks big networks
in a hardware-efficient way
DONNA can be rerun for any new
device or setting within a day
DONNA works on many different
tasks out of the box
DONNA enables scalability and
allows models to be easily updated
after small changes rather than
starting from scratch
25
26. 26
26
AIMET and AIMET Model Zoo are products of Qualcomm Innovation Center, Inc.
Leading AI research and fast commercialization
Driving the industry towards integer inference and power-efficient AI
AI Model Efficiency Toolkit (AIMET)
AIMET Model Zoo
Relaxed Quantization
(ICLR 2019)
Data-free Quantization
(ICCV 2019)
AdaRound
(ICML 2020)
Bayesian Bits
(NeurIPS 2020)
Quantization
research
Quantization
open-sourcing
28. 28
28
AIMET makes AI models small
Open-sourced GitHub project that includes state-of-the-art quantization
and compression techniques from Qualcomm AI Research
If interested, please join the AIMET GitHub project: https://github.com/quic/aimet
Trained
AI model
AI Model Efficiency Toolkit
(AIMET)
Optimized
AI model
TensorFlow or PyTorch
Compression
Quantization
Deployed
AI model
Features:
State-of-the-art
network compression
tools
State-of-the-art
quantization
tools
Support for both
TensorFlow
and PyTorch
Benchmarks
and tests for
many models
Developed by
professional software
developers
29. 29
Benefits
Lower memory
bandwidth
Lower
power
Lower
storage
Higher
performance
Maintains model
accuracy
Simple
ease of use
AIMET
Providing advanced
model efficiency
features and benefits
Features
Quantization
Compression
State-of-the-art INT8 and
INT4 performance
Post-training quantization methods,
including Data-Free Quantization
and Adaptive Rounding (AdaRound) —
coming soon
Quantization-aware training
Quantization simulation
Efficient tensor decomposition
and removal of redundant
channels in convolution layers
Spatial singular value
decomposition (SVD)
Channel pruning
Visualization
Analysis tools for drawing insights
for quantization and compression
Weight ranges
Per-layer compression sensitivity
30. 30
30
APIs invoked directly from the pipeline
Supports TensorFlow and PyTorch
Direct algorithm API frameworks
User-friendly APIs
compress_model (model,
eval_callback=obj_det_eval,
compress_scheme=Scheme.spatial_svd, ... )
equalize_model (model, ...)
AIMET features and APIs are easy to use
Designed to fit naturally in the AI model development workflow for researchers, developers, and ISVs
AIMET
extensions extensions
Model optimization library
(techniques to compress & quantize models)
Framework
specific API
Algorithm
API
Other
frameworks
31. 31
Activation
range estimation
Bias
correction
Weight
quantization
Bias
absorption
Equalize
weight ranges
Avoid high
activation ranges
DFQ: data free quantization
Data Free Quantization results in AIMET
Post-training technique enabling INT8 inference with very minimal loss in accuracy
Cross-layer
equalization
Measure and
correct shift in
layer outputs
Estimate
activation ranges
for quantization
<1%
% Reduction in accuracy
between FP32 ad INT8
DFQ
example
results MobileNet-v2
(top-1 accuracy)
<1%
ResNet-50
(top-1 accuracy)
<1%
DeepLabv3
(mean intersection over union)
31
32. 32
INT8, AdaRound quantization
AP: Average Precision
INT8, baseline quantization
Post-training technique that
makes INT8 quantization
more accurate and INT4
quantization possible
AdaRound is
coming soon
to AIMET
82.20
Bitwidth Mean AP (mAP)
FP32
INT8 baseline
quantization
INT8 AdaRound
quantization
49.85
81.21
<1%
Reduction in
accuracy
between
FP32 ad INT8
AdaRound
quantization
32
34. 34
AIMET
Model Zoo
Accurate pre-trained 8-bit
quantized models
Image
classification
Semantic
segmentation
Pose
estimation
Speech
recognition
Super
resolution
Object
detection
35. 35
*: Comparison between FP32 model and INT8 model quantized with AIMET.
For further details, check out: https://github.com/quic/aimet-model-zoo/
ResNet-50
(v1)
Top-1 accuracy*
FP32 INT8
75.21% 74.96%
MobileNet-
v2-1.4
Top-1 accuracy*
FP32 INT8
75% 74.21%
EfficientNet
Lite
Top-1 accuracy*
FP32 INT8
74.93% 74.99%
SSD
MobileNet-v2
mAP*
FP32 INT8
0.2469 0.2456
RetinaNet
mAP*
FP32 INT8
0.35 0.349
Pose
estimation
mAP*
FP32 INT8
0.383 0.379
SRGAN
PSNR*
FP32 INT8
25.45 24.78
MobileNetV2
Top-1 accuracy*
FP32 INT8
7167% 71.14%
EfficientNet-
lite0
Top-1 accuracy*
FP32 INT8
75.42% 74.44%
DeepLabV3+
mIoU*
FP32 INT8
72.62% 72.22%
MobileNetV2-
SSD-Lite
mAP*
FP32 INT8
68.7% 68.6%
Pose
estimation
mAP*
FP32 INT8
0.364 0.359
SRGAN
PSNR
FP32 INT8
25.51 25.5
DeepSpeech2
WER*
FP32 INT8
9.92% 10.22%
AIMET Model Zoo includes popular quantized AI models
Accuracy is maintained for INT8 models — less than 1% loss*
35
<1%
Loss in
accuracy*
36. 36
Baseline quantization: Post-training quantization
using min-max based quantization grid
AIMET quantization: Model fine-tuned using
Quantization Aware Training in AIMET
FP32 INT8
(AIMET quantization)
INT8
(Baseline quantization)
AIMET Model
Zoo models
preserve accuracy
Visual difference in model
accuracy is telling between
AIMET and baseline
quantization methods
For DeepLabv3+
semantic segmentation,
AIMET quantization
maintains accuracy,
while baseline quantization
method is inaccurate
Accurate
segmentation
Inaccurate
segmentation
37. 37
37
Join our open-source projects
AIMET
State-of-the-art quantization and compression techniques
github.com/quic/aimet
AIMET Model Zoo
Accurate pre-trained 8-bit quantized models
github.com/quic/aimet-model-zoo
38. 38
38
AI model efficiency is crucial
for making AI ubiquitous, leading to
smarter devices and enhanced lives
We are conducting leading research
and development in AI model
efficiency while maintaining accuracy
Our open-source projects, based on
this leading research, are making it
possible for the industry to adopt
efficient AI models at scale