SlideShare a Scribd company logo
1 of 37
Download to read offline
Snapdragon and Qualcomm branded products are products of Qualcomm Technologies, Inc. and/or its subsidiaries.
November 8, 2023
@QCOMResearch
Snapdragon and Qualcomm branded products are products of Qualcomm Technologies, Inc. and/or its subsidiaries.
Generative AI
at the edge
Joseph Soriaga
Senior Director, Technology
Qualcomm Technologies, Inc.
2
Today’s
agenda
Why on-device
generative AI is key
Full-stack AI optimizations
for diffusion models —
Stable Diffusion
Full-stack AI optimizations
for large language models —
Llama 2
Hybrid AI technologies
and architectures
Q&A
2
3
AIMET is a product of Qualcomm Innovation Center, Inc. Qualcomm AI Research is an initiative of Qualcomm Technologies, Inc.
LLM: Large language mode; LVM: Language vision model
Leading machine
learning research
for on-device AI
across the entire
spectrum of topics
Platform
research
Applied
research
Fundamental
research
AI research
Generative AI
G-CNN
Self-supervised
learning
Reinforcement
learning
Causality and
system-2
Model quantization,
compression, and NAS
HW-SW
co-design
Distillation of
generative models
Power
management
AI Model Efficiency
Toolkit (AIMET)
Deep learning
for 3D/geometry
Audio and video
compression
AI for wireless
and RF sensing
Energy-efficient
perception
AI for
chip design
On-device
learning
Bayesian
distributed learning
Graph and kernel
optimization
Federated
learning
Deep learning
for graphics
Video recognition
and prediction
Virtual AI
assistant
(LLM)
Diffusion-based
image generation
(LVM)
Voice UI
4
Full-stack
AI research &
optimization
Model, hardware, and software
innovation across each layer
to accelerate AI applications
Early R&D and
technology inventions
essential to leading
the ecosystem forward
Transfer tech to commercial
teams and influence future
research with learnings
from deployment
Vision
Identify a problem
or need; establish
requirements
Ecosystem
collaboration
Collaborate and
drive the ecosystem
toward rapid
commercialization
at scale
~2-3
years
Model
quantization &
optimization
Develop tech & tools
to quantize weights and
modify architecture to run
efficiently on hardware
Software
compilation
Develop tech & tools
to improve graph-level
and kernel-level software
compilation performance
Proof of concept
Target teams integrate models
into final application for stable and
intuitive demonstration
Invention
Invent new methods
that set state-of-the-art
5
World’s first
on-device demo of
Stable Diffusion
running on an
Android phone
1B+ parameter generative AI model
runs efficiently and interactively
Full-stack AI optimization to achieve sub-15
second latency for 20 inference steps
Enhanced privacy, security, reliability,
and cost with on-device processing
Fast development enabled by Qualcomm
AI Research and Qualcomm® AI Stack
At MWC
2023
6
Text generation
(ChatGPT, Bard, Llama, etc.)
Image generation
(Stable Diffusion, MidJourney, etc.)
Code generation
(Codex, etc.)
6
Input
prompts
“Write a lullaby about
cats and dogs to help
a child fall asleep, include
a golden shepherd”
A great
lullaby is
created in
seconds
Real-life application
of this platform
• Communications,
• Journalism,
• Publishing,
• Creative writing
• Writing assistance
Input
prompts
“Super cute fluffy
cat warrior in armor”
Real-life application of this platform
• Advertisements
• Published
illustrations
• Corporate visuals
• Novel image
generation
Input
prompts
“Create code for a pool
cleaning website with
tab for cleaning, repairs,
and testimonials”
Real-life application
of this platform
• Web design
• Software development
• Coding
• Technology
A beautiful website
is created in seconds
What is
generative AI?
AI models that create new and
original content like text, images,
video, audio, or other data
Generative AI, foundational models,
and large language models are
sometimes used interchangeably
7
7
Infrastructure
Cloud
Hyperscaler datacenters,
enterprise servers
Assistant app (using foundation models)
Vertical applications for consumers and knowledge
workers to assist with various tasks such
as writing content, coding, designing etc.
Tooling/orchestration
Developer tools and platforms for generative AI
Foundation model
Generic models
General purpose LLM
and others; exposed
functionality the APIs
Domain specific models
Purpose-specific model
development and/or
training (enterprise, pro
photo/video, simulated data)
Assistant app (using own model)
Vertical application implementation
from model (e.g., LLM) development
and training to user app
Machine learning apps
Labeling, training, model hub,
optimization, etc.
The
generative
AI ecosystem
stack
is allowing many
apps to proliferate
8
8
Generative AI will impact use cases across device categories
Gen AI can help improve customer
and employee experience in retail,
such as providing recommendations
for inventory and store layout
“Suggest inventory
and store layout
changes to increase
user satisfaction
in the sports section”
IoT
Gen AI is transforming productivity
by composing emails, creating
presentations, and writing code
PC
Phone
“Make me
reservations for
a weekend getaway
at the place Bob
recommended”
Gen AI can
become a true
digital assistant
XR
Gen AI can
help create
immersive
3D virtual
worlds
based on
simple
prompts
Automotive
Gen AI can
be used for
ADAS/AD
to help improve
drive policy by
predicting the
trajectory and
behavior of
various agents
“Make me a status
presentation for
my boss based
on inputs from
my team”
9
9
Generative AI with diffusion models for robotics path planning
9
Stable Diffusion
Denoising an image
with a diffusion model
Generating robot trajectories
Instead of diffusing an image
we diffuse a robot trajectory
10
2024
2023
Assuming INT4 parameters
On-device AI
can support
a variety of
Gen AI models
A broad number of Gen AI
capabilities can run on device
using models that range from
1 to 10 billion parameters
We can run models with
over 1 billion parameters
on device today and
anticipate this growing to
over 10 billion parameters
in the coming months
0.1 1 10 100 1000
Collaborative robotics
Video understanding
Image understanding
Combinatorial optimization
Mathematical reasoning
Programming
Dialog and NLP
Text-to-image
Model size (billions of parameters)
11
Knowledge
distillation
Create a smaller model
with fewer parameters
Run faster inference
on target deployment
Maintain prediction
quality close to
the teacher
Less training time
Training a smaller
“student” model
to mimic a larger
“teacher” model
Teacher model
Training data
Student model
Logits
Knowledge
distillation
Logits
Soft labels
Match logits of the models
to transfer teacher model
representation and minimize
distillation loss (KL divergence)
Output
Output
Cross
entropy
loss
Ground
truth
12
On-device
intelligence is
paramount
Process data closest to the
source, complement the cloud
Privacy
Reliability
Low latency
Cost
Energy
Personalization
13
Output image
VAE: Variational Auto Encoder;
CLIP: Contrastive Language-Image Pre-Training
What is
diffusion?
Reverse
diffusion
(subtract
noise or
denoise)
Forward
diffusion
(add noise)
Image
generation
Stable
Diffusion
architecture
UNet is the biggest component
model of Stable Diffusion
Many steps, often 20 or more,
are used for generating
high-quality images
Significant compute
is required
Input prompt
Stable Diffusion
(1B+ parameters)
CLIP text encoder
(123M parameters)
Scheduler UNet
(860M parameters)
VAE decoder
(50M parameters)
Step
Vase in Greek style with intricate patterns and design
14
14
Original Stable Diffusion UNet
Pruning &
knowledge distillation
More efficient architecture design through pruning and knowledge distillation
Reducing UNet compute (FLOPs), model size, and peak memory usage
Efficient UNet
Convolutional
block
Attention
block
15
15
DDIM: Denoising Diffusion Implicit Models; MSE: Mean-squared error
Step distillation for the DDIM scheduler
Teach the student model to achieve in one step what the teacher achieves in multiple steps
Teacher: 2 UNets
Student: 1 UNet
MSE loss
16
16
FID↓ CLIP ↑ Inference latency
Baseline (SD-1.5)
Fast SD
17.14* 0.3037 5.05 seconds
20.08 0.3004 0.56 seconds
16
Fast
Stable
Diffusion
Reduces UNet
forward passes
to less than 20
Step
distillation
Combines conditional and
unconditional generation
Guidance
conditioning
Reduces compute
(FLOPs), model size,
peak memory usage
Efficient
UNet
Reparameterization from
epsilon to velocity space
for robust distillation
*: These results are not directly comparable since baseline Stable Diffusion was trained with over 20x larger dataset than fast Stable Diffusion. SD: Stable Diffusion
Our full-stack AI optimization of Stable Diffusion
significantly improves latency while maintaining accuracy
e-to-v
Baseline
Stable
Diffusion
speedup vs baseline
Stable Diffusion
9x
17
17
Fast
Stable
Diffusion
Stable
Diffusion
V1.5
Similar image quality between our fast implementation and baseline model
Panoramic view of mountains
of Vestrahorn and perfect
reflection in shallow water,
soon after sunrise, Stokksnes,
South Iceland, Polar Regions,
natural lighting
A hyper realistic photo of a
beautiful cabin inside of a forest
and full of trees and plants, with
large aurora borealis in the sky
Underwater world, plants,
flowers, shells, creatures,
high detail, sharp focus, 4k
High quality colored pencil
sketch portrait of an anthro
furry fursona blue fox,
handsome eyes, sketch
doodles surrounding it, photo
of notebook sketch
Japanese garden
at wild life river and
mountain range,
highly detailed,
digital illustration
18
World’s fastest AI
text-to-image
generative AI
on a phone
Takes less than 0.6 seconds for generating
512x512 images from text prompts
Efficient UNet architecture, guidance conditioning,
and step distillation
Full-stack AI optimization to achieve this
improvement
19
LVM: Language vision model
AI acceleration on the Qualcomm®
Hexagon™ NPU of the Snapdragon® 8
Gen 3 Mobile Processor
Full-stack AI
optimization
for LVM
Runs completely
on the device
Significantly reduces
runtime latency and
power consumption
Continuously improves
the Qualcomm® AI Stack
Qualcomm® AI Engine direct
for improved performance and
minimized memory spillage
Knowledge distillation for pruning and
removing of attention blocks, resulting in
accurate model with improved performance
and power efficiency
Designing an efficient diffusion
model through knowledge
distillation for high accuracy
20
20
LLMs are highly bandwidth limited rather than compute limited
Illustration of autoregressive language modeling
Single-token generation architecture of large languages models results in high memory bandwidth
Recite the first law of robotics
Recite the first law of robotics
A robot may not injure a human being
A robot may not injure a human
robot may not injure a human
A
Huge bandwidth
Each parameter of the
model must be read to
generate each token
(e.g., read 7B parameters
for Llama 7B to generate
a single token)
DRAM
NPU DDR
TCM
Transformer layer 1
Transformer layer N
Embeddings
LM head
LLM
21
LLM quantization
motivations
LLM quantization
challenges
A 4x smaller model
(i.e., FP16 -> INT4)
Reduce memory
bandwidth and storage
Reduce latency
Reduce power consumption
Maintain accuracy of
FP published models
Post-training quantization
(PTQ) may not be accurate
enough for 4-bit
The training pipeline (e.g., data
or rewards) is not available for
quantization aware training (QAT)
Shrinking an LLM
for increased performance
while maintaining accuracy
is challenging
22
1: Perplexity is average over several test sets, including wikitext and c4 (subset)
Quantization-aware training with knowledge distillation
Reduces memory footprint while solving quantization challenges of
maintaining model accuracy and the lack of original training pipeline
<1
Point increase
in perplexity1
<1%
Decrease in
accuracy
Construct a
training loop
that can run
two models
on the same
input data
Teacher
Llama-2-Chat 7B
[FP16]
Student
Llama-2-Chat 7B
[INT4]
Dataset
true labels
Teacher logits
Student logits
Loss1: KL loss
(Teacher soft logits,
student soft logits)
Loss2: Cross entropy
loss (True labels,
student hard logits)
KD loss function combines the KL divergence
loss and hard-label based CE loss
Hard logits
(no temperature)
Soft logits
(temperature = 4)
Classes
Probability
23
Recite the first law of robotics
Recite the first law of robotics
A robot may not injure those human being
A robot may not injure a human
robot may not injure those human
A
Recite the first law of robotics
Recite the first law of robotics
A robot may not injure a human being
A robot may not injure a human
robot may not injure a human
A
A robot may
A robot may
not injure a
not injure
injure
a
a
Llama 2
Llama 2 draft
not
not
not
Speculative decoding
speeds up token rate by trading
off compute for bandwidth
A good draft model predicts
with a high acceptance rate
Draft model generates a few
speculative tokens at a time
Target model decides which
to accept in one pass
Train a significantly smaller draft
LLM for speculative decoding
while maintaining enough
accuracy is challenging
Small draft model
motivations
Small draft model
challenges
10x smaller draft model
than target model
Fast results
Reduce memory bandwidth,
storage, latency,
and power consumption
The training pipeline (e.g., data
or rewards) is not available
Cover multiple families,
e.g., 7B and 13B models
Match the distribution of the target
model for higher acceptance rate
25
Speculative decoding provides speedup with no accuracy loss
Using our research techniques on Llama 2-7B Chat, we achieved
Upto
20
tokens per second
26
AI assistant
enables basic
chat and
chat-assisted
apps on device
Orchestration across
different tasks based
on user query
Powered by
Llama 2 Chat (7B)
Voice UI with Snapdragon Voice
Activation and
Whisper-Small (244M)
Orchestrator
Task classification
Travel planner
API
Llama 2 Chat
(7B param)
User interface
Voice/text/browser
miniLM
(∼33M param)
Snapdragon Voice
Activation &
Whisper-Small
(∼244M param)
LM: Language model
27
28
World’s fastest
Llama 2-7B
on a phone
Up to 20 tokens per second
Demonstrating both chat and
application interaction on
device
World’s first demonstration of
speculative decoding running
on a phone
At
Snapdragon
Summit
2023
29
QAT: Quantization-aware training; LLM: Large language model
AI acceleration on the Qualcomm®
Hexagon™ NPU of the Snapdragon® 8
Gen 3 Mobile Processor
Full-stack AI
optimization
for LLM
Runs completely
on the device
Significantly reduces
runtime latency and
power consumption
Continuously improves
the Qualcomm® AI Stack
Qualcomm AI Engine direct
for improved performance and
minimized memory spillage
QAT with knowledge distillation for accurate
INT4 target LLM for improved performance
and power efficiency
Designing a good draft model for given
target model through knowledge distillation
for high acceptance and no accuracy loss
30
30
1: Reuters 2023
Cloud economics will not allow generative AI to scale
Cost per query1
Gen AI applications
Coding assistant
Copy creation
Web search
Personal assistant
Image & video creation
Text summarization
Conversational chatbots
…
Billions of users
(e.g. web search)
Traditional Generative AI
~10x
31
We are a leader in the
realization of the hybrid AI
Convergence of:
Wireless connectivity
Efficient computing
Distributed AI
Unlocking the data
that will fuel our digital
future and generative AI
To scale, the center of
gravity of AI processing is
moving to the edge
Central cloud Edge cloud On device
Hybrid AI
31
Cost
Energy
Reliability, latency,
& performance
Privacy & security
Personalization
32
Device-centric
hybrid AI
On-device neural network
or rules-based arbiter will
decide where to run the model
More complex models will
use the cloud as needed
It will be seamless to the user
On-device neural network
or rules-based arbiter
Yes
Is cloud needed?
No
The device acts as
the anchor point
33
33
33
ASR: Automatic speech recognition; CV: Computer vision; TTS: Text to speech
Device-sensing hybrid AI
The device acts as the eyes and ears
Simple
model
ASR, CV, TTS
Speech
Image/video
LLM
Text
Text answer
TTS
Speech
Image/video
LLM
Improved
prompt
Text answer
TTS
Advanced
model
ASR, CV, TTS
• Sensor and
human-machine
interface processing
run on device
• ASR, CV, TTS
• LLM runs in the cloud
• For advanced version,
an on-device orchestrator
uses on-device learning
and personal data to
provided improved
prompts to the LLM
34
34
34
Joint-processing hybrid AI
Multi-token speculative decoding as an example
• LLMs are memory-bound
and produce a single token
per inference, reading in all
the weights
• The smaller draft model
runs on device, sequentially
• The larger target model
runs on the cloud, in
parallel and speculatively
• The good tokens
are accepted
• Results in net speedup
in tokens per unit time
and energy savings
Predict
draft model
Four tokens sequentially
computed on device
Accept
Average 2 to 3 are
correct and accepted
Verify
target model
Four tokens speculatively
computed in parallel in cloud
1 2 3 4
1
2
X
X
1
2
3
4
35
On-device generative AI offers many
benefits
Generative AI is happening now on the
device
Our on-device AI leadership
is enabling generative AI to scale
Hybrid AI is the future
35
36
www.qualcomm.com/news/onq
Connect with us
Questions
www.youtube.com/c/QualcommResearch
www.slideshare.net/qualcommwirelessevolution
www.qualcomm.com/research/artificial-intelligence
@QCOMResearch
https://assets.qualcomm.com/mobile- computing-newsletter-sign-up.html
Follow us on:
For more information, visit us at:
qualcomm.com & qualcomm.com/blog
Thank you
Nothing in these materials is an offer to sell any of the components
or devices referenced herein.
©2018-2023 Qualcomm Technologies, Inc. and/or its affiliated
companies. All Rights Reserved.
Qualcomm, Snapdragon, Adreno, Hexagon, Kryo, FastConnect, and
Qualcomm Spectra are trademarks or registered trademarks of Qualcomm
Incorporated. Other products and brand names may be trademarks
or registered trademarks of their respective owners.
References in this presentation to “Qualcomm” may mean Qualcomm Incorporated,
Qualcomm Technologies, Inc., and/or other subsidiaries or business units within
the Qualcomm corporate structure, as applicable. Qualcomm Incorporated includes
our licensing business, QTL, and the vast majority of our patent portfolio. Qualcomm
Technologies, Inc., a subsidiary of Qualcomm Incorporated, operates, along with its
subsidiaries, substantially all of our engineering, research and development
functions, and substantially all of our products and services businesses, including
our QCT semiconductor business.
Snapdragon and Qualcomm branded products are products of Qualcomm
Technologies, Inc. and/or its subsidiaries. Qualcomm patented technologies are
licensed by Qualcomm Incorporated.

More Related Content

What's hot

An Introduction to Generative AI - May 18, 2023
An Introduction  to Generative AI - May 18, 2023An Introduction  to Generative AI - May 18, 2023
An Introduction to Generative AI - May 18, 2023CoriFaklaris1
 
Using the power of Generative AI at scale
Using the power of Generative AI at scaleUsing the power of Generative AI at scale
Using the power of Generative AI at scaleMaxim Salnikov
 
Using Generative AI
Using Generative AIUsing Generative AI
Using Generative AIMark DeLoura
 
Generative AI Risks & Concerns
Generative AI Risks & ConcernsGenerative AI Risks & Concerns
Generative AI Risks & ConcernsAjitesh Kumar
 
Unlocking the Power of Generative AI An Executive's Guide.pdf
Unlocking the Power of Generative AI An Executive's Guide.pdfUnlocking the Power of Generative AI An Executive's Guide.pdf
Unlocking the Power of Generative AI An Executive's Guide.pdfPremNaraindas1
 
The Rise of the LLMs - How I Learned to Stop Worrying & Love the GPT!
The Rise of the LLMs - How I Learned to Stop Worrying & Love the GPT!The Rise of the LLMs - How I Learned to Stop Worrying & Love the GPT!
The Rise of the LLMs - How I Learned to Stop Worrying & Love the GPT!taozen
 
Large Language Models - Chat AI.pdf
Large Language Models - Chat AI.pdfLarge Language Models - Chat AI.pdf
Large Language Models - Chat AI.pdfDavid Rostcheck
 
Cavalry Ventures | Deep Dive: Generative AI
Cavalry Ventures | Deep Dive: Generative AICavalry Ventures | Deep Dive: Generative AI
Cavalry Ventures | Deep Dive: Generative AICavalry Ventures
 
Responsible Generative AI
Responsible Generative AIResponsible Generative AI
Responsible Generative AICMassociates
 
𝐆𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐯𝐞 𝐀𝐈: 𝐂𝐡𝐚𝐧𝐠𝐢𝐧𝐠 𝐇𝐨𝐰 𝐁𝐮𝐬𝐢𝐧𝐞𝐬𝐬 𝐈𝐧𝐧𝐨𝐯𝐚𝐭𝐞𝐬 𝐚𝐧𝐝 𝐎𝐩𝐞𝐫𝐚𝐭𝐞𝐬
𝐆𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐯𝐞 𝐀𝐈: 𝐂𝐡𝐚𝐧𝐠𝐢𝐧𝐠 𝐇𝐨𝐰 𝐁𝐮𝐬𝐢𝐧𝐞𝐬𝐬 𝐈𝐧𝐧𝐨𝐯𝐚𝐭𝐞𝐬 𝐚𝐧𝐝 𝐎𝐩𝐞𝐫𝐚𝐭𝐞𝐬𝐆𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐯𝐞 𝐀𝐈: 𝐂𝐡𝐚𝐧𝐠𝐢𝐧𝐠 𝐇𝐨𝐰 𝐁𝐮𝐬𝐢𝐧𝐞𝐬𝐬 𝐈𝐧𝐧𝐨𝐯𝐚𝐭𝐞𝐬 𝐚𝐧𝐝 𝐎𝐩𝐞𝐫𝐚𝐭𝐞𝐬
𝐆𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐯𝐞 𝐀𝐈: 𝐂𝐡𝐚𝐧𝐠𝐢𝐧𝐠 𝐇𝐨𝐰 𝐁𝐮𝐬𝐢𝐧𝐞𝐬𝐬 𝐈𝐧𝐧𝐨𝐯𝐚𝐭𝐞𝐬 𝐚𝐧𝐝 𝐎𝐩𝐞𝐫𝐚𝐭𝐞𝐬VINCI Digital - Industrial IoT (IIoT) Strategic Advisory
 
ChatGPT, Foundation Models and Web3.pptx
ChatGPT, Foundation Models and Web3.pptxChatGPT, Foundation Models and Web3.pptx
ChatGPT, Foundation Models and Web3.pptxJesus Rodriguez
 
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...David Talby
 
Large Language Models Bootcamp
Large Language Models BootcampLarge Language Models Bootcamp
Large Language Models BootcampData Science Dojo
 
Let's talk about GPT: A crash course in Generative AI for researchers
Let's talk about GPT: A crash course in Generative AI for researchersLet's talk about GPT: A crash course in Generative AI for researchers
Let's talk about GPT: A crash course in Generative AI for researchersSteven Van Vaerenbergh
 
ChatGPT and OpenAI.pdf
ChatGPT and OpenAI.pdfChatGPT and OpenAI.pdf
ChatGPT and OpenAI.pdfSonal Tiwari
 
Generative AI Use-cases for Enterprise - First Session
Generative AI Use-cases for Enterprise - First SessionGenerative AI Use-cases for Enterprise - First Session
Generative AI Use-cases for Enterprise - First SessionGene Leybzon
 
Generative Models and ChatGPT
Generative Models and ChatGPTGenerative Models and ChatGPT
Generative Models and ChatGPTLoic Merckel
 
Generative AI, WiDS 2023.pptx
Generative AI, WiDS 2023.pptxGenerative AI, WiDS 2023.pptx
Generative AI, WiDS 2023.pptxColleen Farrelly
 

What's hot (20)

An Introduction to Generative AI - May 18, 2023
An Introduction  to Generative AI - May 18, 2023An Introduction  to Generative AI - May 18, 2023
An Introduction to Generative AI - May 18, 2023
 
Using the power of Generative AI at scale
Using the power of Generative AI at scaleUsing the power of Generative AI at scale
Using the power of Generative AI at scale
 
Using Generative AI
Using Generative AIUsing Generative AI
Using Generative AI
 
Generative AI Risks & Concerns
Generative AI Risks & ConcernsGenerative AI Risks & Concerns
Generative AI Risks & Concerns
 
Unlocking the Power of Generative AI An Executive's Guide.pdf
Unlocking the Power of Generative AI An Executive's Guide.pdfUnlocking the Power of Generative AI An Executive's Guide.pdf
Unlocking the Power of Generative AI An Executive's Guide.pdf
 
The Rise of the LLMs - How I Learned to Stop Worrying & Love the GPT!
The Rise of the LLMs - How I Learned to Stop Worrying & Love the GPT!The Rise of the LLMs - How I Learned to Stop Worrying & Love the GPT!
The Rise of the LLMs - How I Learned to Stop Worrying & Love the GPT!
 
Large Language Models - Chat AI.pdf
Large Language Models - Chat AI.pdfLarge Language Models - Chat AI.pdf
Large Language Models - Chat AI.pdf
 
Cavalry Ventures | Deep Dive: Generative AI
Cavalry Ventures | Deep Dive: Generative AICavalry Ventures | Deep Dive: Generative AI
Cavalry Ventures | Deep Dive: Generative AI
 
Responsible Generative AI
Responsible Generative AIResponsible Generative AI
Responsible Generative AI
 
𝐆𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐯𝐞 𝐀𝐈: 𝐂𝐡𝐚𝐧𝐠𝐢𝐧𝐠 𝐇𝐨𝐰 𝐁𝐮𝐬𝐢𝐧𝐞𝐬𝐬 𝐈𝐧𝐧𝐨𝐯𝐚𝐭𝐞𝐬 𝐚𝐧𝐝 𝐎𝐩𝐞𝐫𝐚𝐭𝐞𝐬
𝐆𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐯𝐞 𝐀𝐈: 𝐂𝐡𝐚𝐧𝐠𝐢𝐧𝐠 𝐇𝐨𝐰 𝐁𝐮𝐬𝐢𝐧𝐞𝐬𝐬 𝐈𝐧𝐧𝐨𝐯𝐚𝐭𝐞𝐬 𝐚𝐧𝐝 𝐎𝐩𝐞𝐫𝐚𝐭𝐞𝐬𝐆𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐯𝐞 𝐀𝐈: 𝐂𝐡𝐚𝐧𝐠𝐢𝐧𝐠 𝐇𝐨𝐰 𝐁𝐮𝐬𝐢𝐧𝐞𝐬𝐬 𝐈𝐧𝐧𝐨𝐯𝐚𝐭𝐞𝐬 𝐚𝐧𝐝 𝐎𝐩𝐞𝐫𝐚𝐭𝐞𝐬
𝐆𝐞𝐧𝐞𝐫𝐚𝐭𝐢𝐯𝐞 𝐀𝐈: 𝐂𝐡𝐚𝐧𝐠𝐢𝐧𝐠 𝐇𝐨𝐰 𝐁𝐮𝐬𝐢𝐧𝐞𝐬𝐬 𝐈𝐧𝐧𝐨𝐯𝐚𝐭𝐞𝐬 𝐚𝐧𝐝 𝐎𝐩𝐞𝐫𝐚𝐭𝐞𝐬
 
ChatGPT, Foundation Models and Web3.pptx
ChatGPT, Foundation Models and Web3.pptxChatGPT, Foundation Models and Web3.pptx
ChatGPT, Foundation Models and Web3.pptx
 
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
Large Language Models, No-Code, and Responsible AI - Trends in Applied NLP in...
 
Large Language Models Bootcamp
Large Language Models BootcampLarge Language Models Bootcamp
Large Language Models Bootcamp
 
Let's talk about GPT: A crash course in Generative AI for researchers
Let's talk about GPT: A crash course in Generative AI for researchersLet's talk about GPT: A crash course in Generative AI for researchers
Let's talk about GPT: A crash course in Generative AI for researchers
 
Generative AI
Generative AIGenerative AI
Generative AI
 
ChatGPT and OpenAI.pdf
ChatGPT and OpenAI.pdfChatGPT and OpenAI.pdf
ChatGPT and OpenAI.pdf
 
Generative AI Use-cases for Enterprise - First Session
Generative AI Use-cases for Enterprise - First SessionGenerative AI Use-cases for Enterprise - First Session
Generative AI Use-cases for Enterprise - First Session
 
Generative Models and ChatGPT
Generative Models and ChatGPTGenerative Models and ChatGPT
Generative Models and ChatGPT
 
Generative AI, WiDS 2023.pptx
Generative AI, WiDS 2023.pptxGenerative AI, WiDS 2023.pptx
Generative AI, WiDS 2023.pptx
 
Generative AI
Generative AIGenerative AI
Generative AI
 

Similar to Generative AI at the edge.pdf

AI firsts: Leading from research to proof-of-concept
AI firsts: Leading from research to proof-of-conceptAI firsts: Leading from research to proof-of-concept
AI firsts: Leading from research to proof-of-conceptQualcomm Research
 
“Accelerating Newer ML Models Using the Qualcomm AI Stack,” a Presentation fr...
“Accelerating Newer ML Models Using the Qualcomm AI Stack,” a Presentation fr...“Accelerating Newer ML Models Using the Qualcomm AI Stack,” a Presentation fr...
“Accelerating Newer ML Models Using the Qualcomm AI Stack,” a Presentation fr...Edge AI and Vision Alliance
 
AI for Manufacturing (Machine Vision, Edge AI, Federated Learning)
AI for Manufacturing (Machine Vision, Edge AI, Federated Learning)AI for Manufacturing (Machine Vision, Edge AI, Federated Learning)
AI for Manufacturing (Machine Vision, Edge AI, Federated Learning)byteLAKE
 
Parallel universe-issue-29
Parallel universe-issue-29Parallel universe-issue-29
Parallel universe-issue-29DESMOND YUEN
 
“Powering the Connected Intelligent Edge and the Future of On-Device AI,” a P...
“Powering the Connected Intelligent Edge and the Future of On-Device AI,” a P...“Powering the Connected Intelligent Edge and the Future of On-Device AI,” a P...
“Powering the Connected Intelligent Edge and the Future of On-Device AI,” a P...Edge AI and Vision Alliance
 
Google cloud Study Jam 2023.pptx
Google cloud Study Jam 2023.pptxGoogle cloud Study Jam 2023.pptx
Google cloud Study Jam 2023.pptxGDSCNiT
 
Serverless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaServerless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaData Science Milan
 
Intelligent internet of things with Google Cloud
Intelligent internet of things with Google CloudIntelligent internet of things with Google Cloud
Intelligent internet of things with Google CloudHenrik Hammer Eliassen
 
Leading Research Across the AI Spectrum
Leading Research Across the AI SpectrumLeading Research Across the AI Spectrum
Leading Research Across the AI SpectrumQualcomm Research
 
AI Scalability for the Next Decade
AI Scalability for the Next DecadeAI Scalability for the Next Decade
AI Scalability for the Next DecadePaula Koziol
 
AIoT: Intelligence on Microcontroller
AIoT: Intelligence on MicrocontrollerAIoT: Intelligence on Microcontroller
AIoT: Intelligence on MicrocontrollerAndri Yadi
 
Global ai night sept 2019 - Milwaukee
Global ai night sept 2019 - MilwaukeeGlobal ai night sept 2019 - Milwaukee
Global ai night sept 2019 - MilwaukeeCameron Vetter
 
Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea GalloDeep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea GalloLinaro
 
Faster deep learning solutions from training to inference - Michele Tameni - ...
Faster deep learning solutions from training to inference - Michele Tameni - ...Faster deep learning solutions from training to inference - Michele Tameni - ...
Faster deep learning solutions from training to inference - Michele Tameni - ...Codemotion
 

Similar to Generative AI at the edge.pdf (20)

AI firsts: Leading from research to proof-of-concept
AI firsts: Leading from research to proof-of-conceptAI firsts: Leading from research to proof-of-concept
AI firsts: Leading from research to proof-of-concept
 
“Accelerating Newer ML Models Using the Qualcomm AI Stack,” a Presentation fr...
“Accelerating Newer ML Models Using the Qualcomm AI Stack,” a Presentation fr...“Accelerating Newer ML Models Using the Qualcomm AI Stack,” a Presentation fr...
“Accelerating Newer ML Models Using the Qualcomm AI Stack,” a Presentation fr...
 
AI for Manufacturing (Machine Vision, Edge AI, Federated Learning)
AI for Manufacturing (Machine Vision, Edge AI, Federated Learning)AI for Manufacturing (Machine Vision, Edge AI, Federated Learning)
AI for Manufacturing (Machine Vision, Edge AI, Federated Learning)
 
Parallel universe-issue-29
Parallel universe-issue-29Parallel universe-issue-29
Parallel universe-issue-29
 
“Powering the Connected Intelligent Edge and the Future of On-Device AI,” a P...
“Powering the Connected Intelligent Edge and the Future of On-Device AI,” a P...“Powering the Connected Intelligent Edge and the Future of On-Device AI,” a P...
“Powering the Connected Intelligent Edge and the Future of On-Device AI,” a P...
 
Google cloud Study Jam 2023.pptx
Google cloud Study Jam 2023.pptxGoogle cloud Study Jam 2023.pptx
Google cloud Study Jam 2023.pptx
 
Serverless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaServerless machine learning architectures at Helixa
Serverless machine learning architectures at Helixa
 
DataArt
DataArtDataArt
DataArt
 
Intelligent internet of things with Google Cloud
Intelligent internet of things with Google CloudIntelligent internet of things with Google Cloud
Intelligent internet of things with Google Cloud
 
Path to continuous delivery
Path to continuous deliveryPath to continuous delivery
Path to continuous delivery
 
AI at Scale in Enterprises
AI at Scale in Enterprises AI at Scale in Enterprises
AI at Scale in Enterprises
 
Leading Research Across the AI Spectrum
Leading Research Across the AI SpectrumLeading Research Across the AI Spectrum
Leading Research Across the AI Spectrum
 
AI Scalability for the Next Decade
AI Scalability for the Next DecadeAI Scalability for the Next Decade
AI Scalability for the Next Decade
 
Machine Learning at the Edge
Machine Learning at the EdgeMachine Learning at the Edge
Machine Learning at the Edge
 
AIoT: Intelligence on Microcontroller
AIoT: Intelligence on MicrocontrollerAIoT: Intelligence on Microcontroller
AIoT: Intelligence on Microcontroller
 
Global ai night sept 2019 - Milwaukee
Global ai night sept 2019 - MilwaukeeGlobal ai night sept 2019 - Milwaukee
Global ai night sept 2019 - Milwaukee
 
The future of AI is hybrid
The future of AI is hybridThe future of AI is hybrid
The future of AI is hybrid
 
Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea GalloDeep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
 
Faster deep learning solutions from training to inference - Michele Tameni - ...
Faster deep learning solutions from training to inference - Michele Tameni - ...Faster deep learning solutions from training to inference - Michele Tameni - ...
Faster deep learning solutions from training to inference - Michele Tameni - ...
 
OA centre of excellence
OA centre of excellenceOA centre of excellence
OA centre of excellence
 

More from Qualcomm Research

Qualcomm Webinar: Solving Unsolvable Combinatorial Problems with AI
Qualcomm Webinar: Solving Unsolvable Combinatorial Problems with AIQualcomm Webinar: Solving Unsolvable Combinatorial Problems with AI
Qualcomm Webinar: Solving Unsolvable Combinatorial Problems with AIQualcomm Research
 
Why and what you need to know about 6G in 2022
Why and what you need to know about 6G in 2022Why and what you need to know about 6G in 2022
Why and what you need to know about 6G in 2022Qualcomm Research
 
Understanding the world in 3D with AI.pdf
Understanding the world in 3D with AI.pdfUnderstanding the world in 3D with AI.pdf
Understanding the world in 3D with AI.pdfQualcomm Research
 
Enabling the metaverse with 5G- web.pdf
Enabling the metaverse with 5G- web.pdfEnabling the metaverse with 5G- web.pdf
Enabling the metaverse with 5G- web.pdfQualcomm Research
 
Presentation - Model Efficiency for Edge AI
Presentation - Model Efficiency for Edge AIPresentation - Model Efficiency for Edge AI
Presentation - Model Efficiency for Edge AIQualcomm Research
 
Bringing AI research to wireless communication and sensing
Bringing AI research to wireless communication and sensingBringing AI research to wireless communication and sensing
Bringing AI research to wireless communication and sensingQualcomm Research
 
How will sidelink bring a new level of 5G versatility.pdf
How will sidelink bring a new level of 5G versatility.pdfHow will sidelink bring a new level of 5G versatility.pdf
How will sidelink bring a new level of 5G versatility.pdfQualcomm Research
 
Scaling 5G to new frontiers with NR-Light (RedCap)
Scaling 5G to new frontiers with NR-Light (RedCap)Scaling 5G to new frontiers with NR-Light (RedCap)
Scaling 5G to new frontiers with NR-Light (RedCap)Qualcomm Research
 
Realizing mission-critical industrial automation with 5G
Realizing mission-critical industrial automation with 5GRealizing mission-critical industrial automation with 5G
Realizing mission-critical industrial automation with 5GQualcomm Research
 
3GPP Release 17: Completing the first phase of 5G evolution
3GPP Release 17: Completing the first phase of 5G evolution3GPP Release 17: Completing the first phase of 5G evolution
3GPP Release 17: Completing the first phase of 5G evolutionQualcomm Research
 
Setting off the 5G Advanced evolution with 3GPP Release 18
Setting off the 5G Advanced evolution with 3GPP Release 18Setting off the 5G Advanced evolution with 3GPP Release 18
Setting off the 5G Advanced evolution with 3GPP Release 18Qualcomm Research
 
5G positioning for the connected intelligent edge
5G positioning for the connected intelligent edge5G positioning for the connected intelligent edge
5G positioning for the connected intelligent edgeQualcomm Research
 
Enabling on-device learning at scale
Enabling on-device learning at scaleEnabling on-device learning at scale
Enabling on-device learning at scaleQualcomm Research
 
The essential role of AI in the 5G future
The essential role of AI in the 5G futureThe essential role of AI in the 5G future
The essential role of AI in the 5G futureQualcomm Research
 
How AI research is enabling next-gen codecs
How AI research is enabling next-gen codecsHow AI research is enabling next-gen codecs
How AI research is enabling next-gen codecsQualcomm Research
 
Role of localization and environment perception in autonomous driving
Role of localization and environment perception in autonomous drivingRole of localization and environment perception in autonomous driving
Role of localization and environment perception in autonomous drivingQualcomm Research
 
Intelligence at scale through AI model efficiency
Intelligence at scale through AI model efficiencyIntelligence at scale through AI model efficiency
Intelligence at scale through AI model efficiencyQualcomm Research
 
How to build high performance 5G networks with vRAN and O-RAN
How to build high performance 5G networks with vRAN and O-RANHow to build high performance 5G networks with vRAN and O-RAN
How to build high performance 5G networks with vRAN and O-RANQualcomm Research
 
What's in the future of 5G millimeter wave?
What's in the future of 5G millimeter wave? What's in the future of 5G millimeter wave?
What's in the future of 5G millimeter wave? Qualcomm Research
 

More from Qualcomm Research (20)

Qualcomm Webinar: Solving Unsolvable Combinatorial Problems with AI
Qualcomm Webinar: Solving Unsolvable Combinatorial Problems with AIQualcomm Webinar: Solving Unsolvable Combinatorial Problems with AI
Qualcomm Webinar: Solving Unsolvable Combinatorial Problems with AI
 
Why and what you need to know about 6G in 2022
Why and what you need to know about 6G in 2022Why and what you need to know about 6G in 2022
Why and what you need to know about 6G in 2022
 
Understanding the world in 3D with AI.pdf
Understanding the world in 3D with AI.pdfUnderstanding the world in 3D with AI.pdf
Understanding the world in 3D with AI.pdf
 
Enabling the metaverse with 5G- web.pdf
Enabling the metaverse with 5G- web.pdfEnabling the metaverse with 5G- web.pdf
Enabling the metaverse with 5G- web.pdf
 
Presentation - Model Efficiency for Edge AI
Presentation - Model Efficiency for Edge AIPresentation - Model Efficiency for Edge AI
Presentation - Model Efficiency for Edge AI
 
Bringing AI research to wireless communication and sensing
Bringing AI research to wireless communication and sensingBringing AI research to wireless communication and sensing
Bringing AI research to wireless communication and sensing
 
How will sidelink bring a new level of 5G versatility.pdf
How will sidelink bring a new level of 5G versatility.pdfHow will sidelink bring a new level of 5G versatility.pdf
How will sidelink bring a new level of 5G versatility.pdf
 
Scaling 5G to new frontiers with NR-Light (RedCap)
Scaling 5G to new frontiers with NR-Light (RedCap)Scaling 5G to new frontiers with NR-Light (RedCap)
Scaling 5G to new frontiers with NR-Light (RedCap)
 
Realizing mission-critical industrial automation with 5G
Realizing mission-critical industrial automation with 5GRealizing mission-critical industrial automation with 5G
Realizing mission-critical industrial automation with 5G
 
3GPP Release 17: Completing the first phase of 5G evolution
3GPP Release 17: Completing the first phase of 5G evolution3GPP Release 17: Completing the first phase of 5G evolution
3GPP Release 17: Completing the first phase of 5G evolution
 
Setting off the 5G Advanced evolution with 3GPP Release 18
Setting off the 5G Advanced evolution with 3GPP Release 18Setting off the 5G Advanced evolution with 3GPP Release 18
Setting off the 5G Advanced evolution with 3GPP Release 18
 
5G positioning for the connected intelligent edge
5G positioning for the connected intelligent edge5G positioning for the connected intelligent edge
5G positioning for the connected intelligent edge
 
Enabling on-device learning at scale
Enabling on-device learning at scaleEnabling on-device learning at scale
Enabling on-device learning at scale
 
The essential role of AI in the 5G future
The essential role of AI in the 5G futureThe essential role of AI in the 5G future
The essential role of AI in the 5G future
 
How AI research is enabling next-gen codecs
How AI research is enabling next-gen codecsHow AI research is enabling next-gen codecs
How AI research is enabling next-gen codecs
 
Role of localization and environment perception in autonomous driving
Role of localization and environment perception in autonomous drivingRole of localization and environment perception in autonomous driving
Role of localization and environment perception in autonomous driving
 
Pioneering 5G broadcast
Pioneering 5G broadcastPioneering 5G broadcast
Pioneering 5G broadcast
 
Intelligence at scale through AI model efficiency
Intelligence at scale through AI model efficiencyIntelligence at scale through AI model efficiency
Intelligence at scale through AI model efficiency
 
How to build high performance 5G networks with vRAN and O-RAN
How to build high performance 5G networks with vRAN and O-RANHow to build high performance 5G networks with vRAN and O-RAN
How to build high performance 5G networks with vRAN and O-RAN
 
What's in the future of 5G millimeter wave?
What's in the future of 5G millimeter wave? What's in the future of 5G millimeter wave?
What's in the future of 5G millimeter wave?
 

Recently uploaded

Connecting the Dots in Product Design at KAYAK
Connecting the Dots in Product Design at KAYAKConnecting the Dots in Product Design at KAYAK
Connecting the Dots in Product Design at KAYAKUXDXConf
 
Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024Enterprise Knowledge
 
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfFIDO Alliance
 
Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityOptimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityScyllaDB
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfFIDO Alliance
 
Google I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGoogle I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGDSC PJATK
 
PLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. StartupsPLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. StartupsStefano
 
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...marcuskenyatta275
 
AI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekAI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekCzechDreamin
 
IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoTAnalytics
 
A Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System StrategyA Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System StrategyUXDXConf
 
Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Patrick Viafore
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessUXDXConf
 
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceSamy Fodil
 
Oauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftOauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftshyamraj55
 
Intro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераIntro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераMark Opanasiuk
 
Designing for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at ComcastDesigning for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at ComcastUXDXConf
 
Top 10 Symfony Development Companies 2024
Top 10 Symfony Development Companies 2024Top 10 Symfony Development Companies 2024
Top 10 Symfony Development Companies 2024TopCSSGallery
 
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...CzechDreamin
 
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka DoktorováCzechDreamin
 

Recently uploaded (20)

Connecting the Dots in Product Design at KAYAK
Connecting the Dots in Product Design at KAYAKConnecting the Dots in Product Design at KAYAK
Connecting the Dots in Product Design at KAYAK
 
Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024
 
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
 
Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityOptimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through Observability
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
 
Google I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGoogle I/O Extended 2024 Warsaw
Google I/O Extended 2024 Warsaw
 
PLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. StartupsPLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. Startups
 
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
TEST BANK For, Information Technology Project Management 9th Edition Kathy Sc...
 
AI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekAI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří Karpíšek
 
IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024
 
A Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System StrategyA Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System Strategy
 
Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
 
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM Performance
 
Oauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftOauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoft
 
Intro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераIntro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджера
 
Designing for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at ComcastDesigning for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at Comcast
 
Top 10 Symfony Development Companies 2024
Top 10 Symfony Development Companies 2024Top 10 Symfony Development Companies 2024
Top 10 Symfony Development Companies 2024
 
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
 
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
 

Generative AI at the edge.pdf

  • 1. Snapdragon and Qualcomm branded products are products of Qualcomm Technologies, Inc. and/or its subsidiaries. November 8, 2023 @QCOMResearch Snapdragon and Qualcomm branded products are products of Qualcomm Technologies, Inc. and/or its subsidiaries. Generative AI at the edge Joseph Soriaga Senior Director, Technology Qualcomm Technologies, Inc.
  • 2. 2 Today’s agenda Why on-device generative AI is key Full-stack AI optimizations for diffusion models — Stable Diffusion Full-stack AI optimizations for large language models — Llama 2 Hybrid AI technologies and architectures Q&A 2
  • 3. 3 AIMET is a product of Qualcomm Innovation Center, Inc. Qualcomm AI Research is an initiative of Qualcomm Technologies, Inc. LLM: Large language mode; LVM: Language vision model Leading machine learning research for on-device AI across the entire spectrum of topics Platform research Applied research Fundamental research AI research Generative AI G-CNN Self-supervised learning Reinforcement learning Causality and system-2 Model quantization, compression, and NAS HW-SW co-design Distillation of generative models Power management AI Model Efficiency Toolkit (AIMET) Deep learning for 3D/geometry Audio and video compression AI for wireless and RF sensing Energy-efficient perception AI for chip design On-device learning Bayesian distributed learning Graph and kernel optimization Federated learning Deep learning for graphics Video recognition and prediction Virtual AI assistant (LLM) Diffusion-based image generation (LVM) Voice UI
  • 4. 4 Full-stack AI research & optimization Model, hardware, and software innovation across each layer to accelerate AI applications Early R&D and technology inventions essential to leading the ecosystem forward Transfer tech to commercial teams and influence future research with learnings from deployment Vision Identify a problem or need; establish requirements Ecosystem collaboration Collaborate and drive the ecosystem toward rapid commercialization at scale ~2-3 years Model quantization & optimization Develop tech & tools to quantize weights and modify architecture to run efficiently on hardware Software compilation Develop tech & tools to improve graph-level and kernel-level software compilation performance Proof of concept Target teams integrate models into final application for stable and intuitive demonstration Invention Invent new methods that set state-of-the-art
  • 5. 5 World’s first on-device demo of Stable Diffusion running on an Android phone 1B+ parameter generative AI model runs efficiently and interactively Full-stack AI optimization to achieve sub-15 second latency for 20 inference steps Enhanced privacy, security, reliability, and cost with on-device processing Fast development enabled by Qualcomm AI Research and Qualcomm® AI Stack At MWC 2023
  • 6. 6 Text generation (ChatGPT, Bard, Llama, etc.) Image generation (Stable Diffusion, MidJourney, etc.) Code generation (Codex, etc.) 6 Input prompts “Write a lullaby about cats and dogs to help a child fall asleep, include a golden shepherd” A great lullaby is created in seconds Real-life application of this platform • Communications, • Journalism, • Publishing, • Creative writing • Writing assistance Input prompts “Super cute fluffy cat warrior in armor” Real-life application of this platform • Advertisements • Published illustrations • Corporate visuals • Novel image generation Input prompts “Create code for a pool cleaning website with tab for cleaning, repairs, and testimonials” Real-life application of this platform • Web design • Software development • Coding • Technology A beautiful website is created in seconds What is generative AI? AI models that create new and original content like text, images, video, audio, or other data Generative AI, foundational models, and large language models are sometimes used interchangeably
  • 7. 7 7 Infrastructure Cloud Hyperscaler datacenters, enterprise servers Assistant app (using foundation models) Vertical applications for consumers and knowledge workers to assist with various tasks such as writing content, coding, designing etc. Tooling/orchestration Developer tools and platforms for generative AI Foundation model Generic models General purpose LLM and others; exposed functionality the APIs Domain specific models Purpose-specific model development and/or training (enterprise, pro photo/video, simulated data) Assistant app (using own model) Vertical application implementation from model (e.g., LLM) development and training to user app Machine learning apps Labeling, training, model hub, optimization, etc. The generative AI ecosystem stack is allowing many apps to proliferate
  • 8. 8 8 Generative AI will impact use cases across device categories Gen AI can help improve customer and employee experience in retail, such as providing recommendations for inventory and store layout “Suggest inventory and store layout changes to increase user satisfaction in the sports section” IoT Gen AI is transforming productivity by composing emails, creating presentations, and writing code PC Phone “Make me reservations for a weekend getaway at the place Bob recommended” Gen AI can become a true digital assistant XR Gen AI can help create immersive 3D virtual worlds based on simple prompts Automotive Gen AI can be used for ADAS/AD to help improve drive policy by predicting the trajectory and behavior of various agents “Make me a status presentation for my boss based on inputs from my team”
  • 9. 9 9 Generative AI with diffusion models for robotics path planning 9 Stable Diffusion Denoising an image with a diffusion model Generating robot trajectories Instead of diffusing an image we diffuse a robot trajectory
  • 10. 10 2024 2023 Assuming INT4 parameters On-device AI can support a variety of Gen AI models A broad number of Gen AI capabilities can run on device using models that range from 1 to 10 billion parameters We can run models with over 1 billion parameters on device today and anticipate this growing to over 10 billion parameters in the coming months 0.1 1 10 100 1000 Collaborative robotics Video understanding Image understanding Combinatorial optimization Mathematical reasoning Programming Dialog and NLP Text-to-image Model size (billions of parameters)
  • 11. 11 Knowledge distillation Create a smaller model with fewer parameters Run faster inference on target deployment Maintain prediction quality close to the teacher Less training time Training a smaller “student” model to mimic a larger “teacher” model Teacher model Training data Student model Logits Knowledge distillation Logits Soft labels Match logits of the models to transfer teacher model representation and minimize distillation loss (KL divergence) Output Output Cross entropy loss Ground truth
  • 12. 12 On-device intelligence is paramount Process data closest to the source, complement the cloud Privacy Reliability Low latency Cost Energy Personalization
  • 13. 13 Output image VAE: Variational Auto Encoder; CLIP: Contrastive Language-Image Pre-Training What is diffusion? Reverse diffusion (subtract noise or denoise) Forward diffusion (add noise) Image generation Stable Diffusion architecture UNet is the biggest component model of Stable Diffusion Many steps, often 20 or more, are used for generating high-quality images Significant compute is required Input prompt Stable Diffusion (1B+ parameters) CLIP text encoder (123M parameters) Scheduler UNet (860M parameters) VAE decoder (50M parameters) Step Vase in Greek style with intricate patterns and design
  • 14. 14 14 Original Stable Diffusion UNet Pruning & knowledge distillation More efficient architecture design through pruning and knowledge distillation Reducing UNet compute (FLOPs), model size, and peak memory usage Efficient UNet Convolutional block Attention block
  • 15. 15 15 DDIM: Denoising Diffusion Implicit Models; MSE: Mean-squared error Step distillation for the DDIM scheduler Teach the student model to achieve in one step what the teacher achieves in multiple steps Teacher: 2 UNets Student: 1 UNet MSE loss
  • 16. 16 16 FID↓ CLIP ↑ Inference latency Baseline (SD-1.5) Fast SD 17.14* 0.3037 5.05 seconds 20.08 0.3004 0.56 seconds 16 Fast Stable Diffusion Reduces UNet forward passes to less than 20 Step distillation Combines conditional and unconditional generation Guidance conditioning Reduces compute (FLOPs), model size, peak memory usage Efficient UNet Reparameterization from epsilon to velocity space for robust distillation *: These results are not directly comparable since baseline Stable Diffusion was trained with over 20x larger dataset than fast Stable Diffusion. SD: Stable Diffusion Our full-stack AI optimization of Stable Diffusion significantly improves latency while maintaining accuracy e-to-v Baseline Stable Diffusion speedup vs baseline Stable Diffusion 9x
  • 17. 17 17 Fast Stable Diffusion Stable Diffusion V1.5 Similar image quality between our fast implementation and baseline model Panoramic view of mountains of Vestrahorn and perfect reflection in shallow water, soon after sunrise, Stokksnes, South Iceland, Polar Regions, natural lighting A hyper realistic photo of a beautiful cabin inside of a forest and full of trees and plants, with large aurora borealis in the sky Underwater world, plants, flowers, shells, creatures, high detail, sharp focus, 4k High quality colored pencil sketch portrait of an anthro furry fursona blue fox, handsome eyes, sketch doodles surrounding it, photo of notebook sketch Japanese garden at wild life river and mountain range, highly detailed, digital illustration
  • 18. 18 World’s fastest AI text-to-image generative AI on a phone Takes less than 0.6 seconds for generating 512x512 images from text prompts Efficient UNet architecture, guidance conditioning, and step distillation Full-stack AI optimization to achieve this improvement
  • 19. 19 LVM: Language vision model AI acceleration on the Qualcomm® Hexagon™ NPU of the Snapdragon® 8 Gen 3 Mobile Processor Full-stack AI optimization for LVM Runs completely on the device Significantly reduces runtime latency and power consumption Continuously improves the Qualcomm® AI Stack Qualcomm® AI Engine direct for improved performance and minimized memory spillage Knowledge distillation for pruning and removing of attention blocks, resulting in accurate model with improved performance and power efficiency Designing an efficient diffusion model through knowledge distillation for high accuracy
  • 20. 20 20 LLMs are highly bandwidth limited rather than compute limited Illustration of autoregressive language modeling Single-token generation architecture of large languages models results in high memory bandwidth Recite the first law of robotics Recite the first law of robotics A robot may not injure a human being A robot may not injure a human robot may not injure a human A Huge bandwidth Each parameter of the model must be read to generate each token (e.g., read 7B parameters for Llama 7B to generate a single token) DRAM NPU DDR TCM Transformer layer 1 Transformer layer N Embeddings LM head LLM
  • 21. 21 LLM quantization motivations LLM quantization challenges A 4x smaller model (i.e., FP16 -> INT4) Reduce memory bandwidth and storage Reduce latency Reduce power consumption Maintain accuracy of FP published models Post-training quantization (PTQ) may not be accurate enough for 4-bit The training pipeline (e.g., data or rewards) is not available for quantization aware training (QAT) Shrinking an LLM for increased performance while maintaining accuracy is challenging
  • 22. 22 1: Perplexity is average over several test sets, including wikitext and c4 (subset) Quantization-aware training with knowledge distillation Reduces memory footprint while solving quantization challenges of maintaining model accuracy and the lack of original training pipeline <1 Point increase in perplexity1 <1% Decrease in accuracy Construct a training loop that can run two models on the same input data Teacher Llama-2-Chat 7B [FP16] Student Llama-2-Chat 7B [INT4] Dataset true labels Teacher logits Student logits Loss1: KL loss (Teacher soft logits, student soft logits) Loss2: Cross entropy loss (True labels, student hard logits) KD loss function combines the KL divergence loss and hard-label based CE loss Hard logits (no temperature) Soft logits (temperature = 4) Classes Probability
  • 23. 23 Recite the first law of robotics Recite the first law of robotics A robot may not injure those human being A robot may not injure a human robot may not injure those human A Recite the first law of robotics Recite the first law of robotics A robot may not injure a human being A robot may not injure a human robot may not injure a human A A robot may A robot may not injure a not injure injure a a Llama 2 Llama 2 draft not not not Speculative decoding speeds up token rate by trading off compute for bandwidth A good draft model predicts with a high acceptance rate Draft model generates a few speculative tokens at a time Target model decides which to accept in one pass
  • 24. Train a significantly smaller draft LLM for speculative decoding while maintaining enough accuracy is challenging Small draft model motivations Small draft model challenges 10x smaller draft model than target model Fast results Reduce memory bandwidth, storage, latency, and power consumption The training pipeline (e.g., data or rewards) is not available Cover multiple families, e.g., 7B and 13B models Match the distribution of the target model for higher acceptance rate
  • 25. 25 Speculative decoding provides speedup with no accuracy loss Using our research techniques on Llama 2-7B Chat, we achieved Upto 20 tokens per second
  • 26. 26 AI assistant enables basic chat and chat-assisted apps on device Orchestration across different tasks based on user query Powered by Llama 2 Chat (7B) Voice UI with Snapdragon Voice Activation and Whisper-Small (244M) Orchestrator Task classification Travel planner API Llama 2 Chat (7B param) User interface Voice/text/browser miniLM (∼33M param) Snapdragon Voice Activation & Whisper-Small (∼244M param) LM: Language model
  • 27. 27
  • 28. 28 World’s fastest Llama 2-7B on a phone Up to 20 tokens per second Demonstrating both chat and application interaction on device World’s first demonstration of speculative decoding running on a phone At Snapdragon Summit 2023
  • 29. 29 QAT: Quantization-aware training; LLM: Large language model AI acceleration on the Qualcomm® Hexagon™ NPU of the Snapdragon® 8 Gen 3 Mobile Processor Full-stack AI optimization for LLM Runs completely on the device Significantly reduces runtime latency and power consumption Continuously improves the Qualcomm® AI Stack Qualcomm AI Engine direct for improved performance and minimized memory spillage QAT with knowledge distillation for accurate INT4 target LLM for improved performance and power efficiency Designing a good draft model for given target model through knowledge distillation for high acceptance and no accuracy loss
  • 30. 30 30 1: Reuters 2023 Cloud economics will not allow generative AI to scale Cost per query1 Gen AI applications Coding assistant Copy creation Web search Personal assistant Image & video creation Text summarization Conversational chatbots … Billions of users (e.g. web search) Traditional Generative AI ~10x
  • 31. 31 We are a leader in the realization of the hybrid AI Convergence of: Wireless connectivity Efficient computing Distributed AI Unlocking the data that will fuel our digital future and generative AI To scale, the center of gravity of AI processing is moving to the edge Central cloud Edge cloud On device Hybrid AI 31 Cost Energy Reliability, latency, & performance Privacy & security Personalization
  • 32. 32 Device-centric hybrid AI On-device neural network or rules-based arbiter will decide where to run the model More complex models will use the cloud as needed It will be seamless to the user On-device neural network or rules-based arbiter Yes Is cloud needed? No The device acts as the anchor point
  • 33. 33 33 33 ASR: Automatic speech recognition; CV: Computer vision; TTS: Text to speech Device-sensing hybrid AI The device acts as the eyes and ears Simple model ASR, CV, TTS Speech Image/video LLM Text Text answer TTS Speech Image/video LLM Improved prompt Text answer TTS Advanced model ASR, CV, TTS • Sensor and human-machine interface processing run on device • ASR, CV, TTS • LLM runs in the cloud • For advanced version, an on-device orchestrator uses on-device learning and personal data to provided improved prompts to the LLM
  • 34. 34 34 34 Joint-processing hybrid AI Multi-token speculative decoding as an example • LLMs are memory-bound and produce a single token per inference, reading in all the weights • The smaller draft model runs on device, sequentially • The larger target model runs on the cloud, in parallel and speculatively • The good tokens are accepted • Results in net speedup in tokens per unit time and energy savings Predict draft model Four tokens sequentially computed on device Accept Average 2 to 3 are correct and accepted Verify target model Four tokens speculatively computed in parallel in cloud 1 2 3 4 1 2 X X 1 2 3 4
  • 35. 35 On-device generative AI offers many benefits Generative AI is happening now on the device Our on-device AI leadership is enabling generative AI to scale Hybrid AI is the future 35
  • 37. Follow us on: For more information, visit us at: qualcomm.com & qualcomm.com/blog Thank you Nothing in these materials is an offer to sell any of the components or devices referenced herein. ©2018-2023 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved. Qualcomm, Snapdragon, Adreno, Hexagon, Kryo, FastConnect, and Qualcomm Spectra are trademarks or registered trademarks of Qualcomm Incorporated. Other products and brand names may be trademarks or registered trademarks of their respective owners. References in this presentation to “Qualcomm” may mean Qualcomm Incorporated, Qualcomm Technologies, Inc., and/or other subsidiaries or business units within the Qualcomm corporate structure, as applicable. Qualcomm Incorporated includes our licensing business, QTL, and the vast majority of our patent portfolio. Qualcomm Technologies, Inc., a subsidiary of Qualcomm Incorporated, operates, along with its subsidiaries, substantially all of our engineering, research and development functions, and substantially all of our products and services businesses, including our QCT semiconductor business. Snapdragon and Qualcomm branded products are products of Qualcomm Technologies, Inc. and/or its subsidiaries. Qualcomm patented technologies are licensed by Qualcomm Incorporated.