SlideShare a Scribd company logo
1 of 15
Download to read offline
Deep learning at the edge:
100x Inference improvement on edge devices
Dr. Lawrence Spracklen
Director of ML Architecture
AI goes big!
● Deep learning dominates today’s AI
conversations
● Delivering significant accuracy
improvements
● Enabled by explosive growth in model sizes
○ Initially in NLP, but increasingly vision and beyond
● Spiraling training and inference costs
● Unsustainable energy utilization
17,000X
increase
Source: Microsoft Research blog
A brute force solution
Perform matrix multiplications very fast
● GPUs have become AI workhorses
○ 500+ trillion arithmetic operations per second per card
● Hardware performance doubles every few years
● Exploding AI costs
○ 2018 : BERT cost $6K+ to train
○ 2020 : GPT-3 cost $10M+ to train
● Hardware failing to keep pace with growth in
model size Source: NVIDIA
Design new hardware
Edge systems are constrained systems
● Problems compounded at the edge
● Limited memory and computation resources
● Makes large SOTA models on edge devices impractical
● Efficiency step-change needed to drive change at the edge
Necessary to address either the software or the hardware
Improve model efficiency
on existing hardware
Source: NVIDIA blog
Natural Intelligence
Examine the human brain
• Neuron interconnections are sparse
• Neuron activations are sparse
• Neurons are low-fidelity
• Neurons are significantly more complex than AI’s point neuron abstraction
• Humans can learn from very limited examples
Sparse models
• Weight sparsity zeros the majority of a model’s neuron weights
• Activation sparsity dynamically activates context dependent sub-networks
• Enables significant computational savings
• Benefits are multiplicative
• Delivering 100X+ reductions in computational costs for identical model architectures
Input#1 Input#2
You can control your models
• Industry has struggled to accelerate sparse models on general-purpose
hardware
• 2-3X acceleration from 20X reduction in non-zero parameters
• However, it possible to precisely control weight and activation placement to
ensure hardware sympathetic patterns
• Achievable with both directly trained sparse models and ‘pruned’ sparse models
• Enables speedups that directly mirror the reduction in non-zeros
• Achievable on existing hardware (FPGAs, CPUs and GPUs)
• 95% weight sparsity delivers 20X acceleration
ConvNets on FPGAs
• Optimized model inference outperforms the traditional model by 112X
• Remember – same accuracy as the original dense model
• Corresponding improvements in per-operation energy efficiency
• Optimized models fit on ZU3EG edge devices
• Edge device outperforms standard models running on the datacenter-class U250!
FPGA
platform
Power
consumption
Network
type
Number of
networks on chip
Full chip throughput
(words/sec)
Full chip
speedup
Relative
efficiency
Alveo U250 225W
Dense 4 12,195 - 100%
Sparse-Dense 24 689,655 56.5 5675%
Sparse-Sparse 20 1,369,863 112.3 11274%
ZU3EG 24W
Dense 0 0 - -
Sparse-Dense 1 21,053 Infinite 1624%
Sparse-Sparse 1 45,455 Infinite 3505%
Transformers on CPUs and GPUs
● Techniques are compatible with large language and vision models
● These optimized models run extremely efficiently on CPUs and GPUs
● 10-30X throughput improvement for typical models
○ Identical model architecture and accuracy
● Accelerates both training and inference
● Significant reduction in memory footprint
Unleash deep learning at the edge
Low latency opportunity
● Long latencies prevent large model use in many
application spaces
○ Small transformer CPU inference latencies can approach 100ms
● Sparse models also deliver improved latency
● Inference latency can be decreased by 10X+
● Enables models to be used with conversational apps
○ Sub-3ms latency for BERTbase on CPUs
Pick your path
Create optimized
model
Accelerated
CPU Inference
Accelerated
GPU Inference
New
Model
Existing
Model
Accelerated
training on GPUs
TRAIN INFERENCE
Pretrained
Model Accelerated fine-
tuning on CPUs
5-20X performance
Improvements in training
10-30X performance
Improvements in inference
Accelerated
CPU Inference
Accelerated
GPU Inference
5-15X performance
Improvements in inference
Example Applications for these Optimized Models
Voice recognition on
embedded devices
Conversational AI – real-
time responsiveness
Computer vision at the edge Forecasting early disorders
on wearables
Taking it further
• Sparsity represents just the start of what’s possible
• Quantization
• Leverage reduced fidelity representations
• Further reduces memory footprint
• Translates into improved performance on most hardware [2.5X+]
• Knowledge distillation
• Reduce number/size of layers & use large teacher during training to maximize accuracy
• Fewer layers directly reduces inference costs [5X+]
• Benefits are multiplicative…
Deep learning models can run 100X faster on existing edge devices
In Summary
● Edge systems are constrained as explosive AI growth continues
● We can leverage structures and efficiencies from the brain to create
hardware-aware, highly performant sparse models
● Optimized models unlock possibilities for deep learning at the edge:
○ Outperform traditional models by 10-100X+
○ Reduced memory footprint
○ Sub 3ms inference latency for large language models
○ Can run where traditional models don’t fit
Interested in partnering with us?
Contact me at lspracklen@numenta.com
or visit numenta.com.
THANK YOU
QUESTIONS?

More Related Content

What's hot

Siemens_2022_JPM-Digital-Twin-Conference.pdf
Siemens_2022_JPM-Digital-Twin-Conference.pdfSiemens_2022_JPM-Digital-Twin-Conference.pdf
Siemens_2022_JPM-Digital-Twin-Conference.pdfAlekseySolomin
 
Why do the majority of Data Science projects never make it to production?
Why do the majority of Data Science projects never make it to production?Why do the majority of Data Science projects never make it to production?
Why do the majority of Data Science projects never make it to production?Itai Yaffe
 
Product Management for AI
Product Management for AIProduct Management for AI
Product Management for AIPeter Skomoroch
 
Enterprise Data Governance Framework With Change Management
Enterprise Data Governance Framework With Change ManagementEnterprise Data Governance Framework With Change Management
Enterprise Data Governance Framework With Change ManagementSlideTeam
 
Data Modeling Techniques
Data Modeling TechniquesData Modeling Techniques
Data Modeling TechniquesDATAVERSITY
 
MLOps Virtual Event | Building Machine Learning Platforms for the Full Lifecycle
MLOps Virtual Event | Building Machine Learning Platforms for the Full LifecycleMLOps Virtual Event | Building Machine Learning Platforms for the Full Lifecycle
MLOps Virtual Event | Building Machine Learning Platforms for the Full LifecycleDatabricks
 
Next Generation Enterprise Architecture
Next Generation Enterprise ArchitectureNext Generation Enterprise Architecture
Next Generation Enterprise ArchitectureMapR Technologies
 
Dataiku Data Science Studio (datasheet)
Dataiku Data Science Studio (datasheet)Dataiku Data Science Studio (datasheet)
Dataiku Data Science Studio (datasheet)John Cann
 
Machine Learning and Industrie 4.0
Machine Learning and Industrie 4.0Machine Learning and Industrie 4.0
Machine Learning and Industrie 4.0Peter Schleinitz
 
Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp...
Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp...Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp...
Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp...Ed Fernandez
 
Building a Data Science as a Service Platform in Azure with Databricks
Building a Data Science as a Service Platform in Azure with DatabricksBuilding a Data Science as a Service Platform in Azure with Databricks
Building a Data Science as a Service Platform in Azure with DatabricksDatabricks
 
Best Practices in DataOps: How to Create Agile, Automated Data Pipelines
Best Practices in DataOps: How to Create Agile, Automated Data PipelinesBest Practices in DataOps: How to Create Agile, Automated Data Pipelines
Best Practices in DataOps: How to Create Agile, Automated Data PipelinesEric Kavanagh
 
Generative AI at the edge.pdf
Generative AI at the edge.pdfGenerative AI at the edge.pdf
Generative AI at the edge.pdfQualcomm Research
 
Next IIoT wave: embedded digital twin for manufacturing
Next IIoT wave: embedded digital twin for manufacturing Next IIoT wave: embedded digital twin for manufacturing
Next IIoT wave: embedded digital twin for manufacturing IRS srl
 
Guiding through a typical Machine Learning Pipeline
Guiding through a typical Machine Learning PipelineGuiding through a typical Machine Learning Pipeline
Guiding through a typical Machine Learning PipelineMichael Gerke
 
The Future of AI is Generative not Discriminative 5/26/2021
The Future of AI is Generative not Discriminative 5/26/2021The Future of AI is Generative not Discriminative 5/26/2021
The Future of AI is Generative not Discriminative 5/26/2021Steve Omohundro
 
Towards Digital Twin standards following an open source approach
Towards Digital Twin standards following an open source approachTowards Digital Twin standards following an open source approach
Towards Digital Twin standards following an open source approachFIWARE
 
Neo4j – The Fastest Path to Scalable Real-Time Analytics
Neo4j – The Fastest Path to Scalable Real-Time AnalyticsNeo4j – The Fastest Path to Scalable Real-Time Analytics
Neo4j – The Fastest Path to Scalable Real-Time AnalyticsNeo4j
 
Lecture 6: Infrastructure & Tooling (Full Stack Deep Learning - Spring 2021)
Lecture 6: Infrastructure & Tooling (Full Stack Deep Learning - Spring 2021)Lecture 6: Infrastructure & Tooling (Full Stack Deep Learning - Spring 2021)
Lecture 6: Infrastructure & Tooling (Full Stack Deep Learning - Spring 2021)Sergey Karayev
 

What's hot (20)

Digital Twin: Starting the journey
Digital Twin: Starting the journeyDigital Twin: Starting the journey
Digital Twin: Starting the journey
 
Siemens_2022_JPM-Digital-Twin-Conference.pdf
Siemens_2022_JPM-Digital-Twin-Conference.pdfSiemens_2022_JPM-Digital-Twin-Conference.pdf
Siemens_2022_JPM-Digital-Twin-Conference.pdf
 
Why do the majority of Data Science projects never make it to production?
Why do the majority of Data Science projects never make it to production?Why do the majority of Data Science projects never make it to production?
Why do the majority of Data Science projects never make it to production?
 
Product Management for AI
Product Management for AIProduct Management for AI
Product Management for AI
 
Enterprise Data Governance Framework With Change Management
Enterprise Data Governance Framework With Change ManagementEnterprise Data Governance Framework With Change Management
Enterprise Data Governance Framework With Change Management
 
Data Modeling Techniques
Data Modeling TechniquesData Modeling Techniques
Data Modeling Techniques
 
MLOps Virtual Event | Building Machine Learning Platforms for the Full Lifecycle
MLOps Virtual Event | Building Machine Learning Platforms for the Full LifecycleMLOps Virtual Event | Building Machine Learning Platforms for the Full Lifecycle
MLOps Virtual Event | Building Machine Learning Platforms for the Full Lifecycle
 
Next Generation Enterprise Architecture
Next Generation Enterprise ArchitectureNext Generation Enterprise Architecture
Next Generation Enterprise Architecture
 
Dataiku Data Science Studio (datasheet)
Dataiku Data Science Studio (datasheet)Dataiku Data Science Studio (datasheet)
Dataiku Data Science Studio (datasheet)
 
Machine Learning and Industrie 4.0
Machine Learning and Industrie 4.0Machine Learning and Industrie 4.0
Machine Learning and Industrie 4.0
 
Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp...
Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp...Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp...
Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp...
 
Building a Data Science as a Service Platform in Azure with Databricks
Building a Data Science as a Service Platform in Azure with DatabricksBuilding a Data Science as a Service Platform in Azure with Databricks
Building a Data Science as a Service Platform in Azure with Databricks
 
Best Practices in DataOps: How to Create Agile, Automated Data Pipelines
Best Practices in DataOps: How to Create Agile, Automated Data PipelinesBest Practices in DataOps: How to Create Agile, Automated Data Pipelines
Best Practices in DataOps: How to Create Agile, Automated Data Pipelines
 
Generative AI at the edge.pdf
Generative AI at the edge.pdfGenerative AI at the edge.pdf
Generative AI at the edge.pdf
 
Next IIoT wave: embedded digital twin for manufacturing
Next IIoT wave: embedded digital twin for manufacturing Next IIoT wave: embedded digital twin for manufacturing
Next IIoT wave: embedded digital twin for manufacturing
 
Guiding through a typical Machine Learning Pipeline
Guiding through a typical Machine Learning PipelineGuiding through a typical Machine Learning Pipeline
Guiding through a typical Machine Learning Pipeline
 
The Future of AI is Generative not Discriminative 5/26/2021
The Future of AI is Generative not Discriminative 5/26/2021The Future of AI is Generative not Discriminative 5/26/2021
The Future of AI is Generative not Discriminative 5/26/2021
 
Towards Digital Twin standards following an open source approach
Towards Digital Twin standards following an open source approachTowards Digital Twin standards following an open source approach
Towards Digital Twin standards following an open source approach
 
Neo4j – The Fastest Path to Scalable Real-Time Analytics
Neo4j – The Fastest Path to Scalable Real-Time AnalyticsNeo4j – The Fastest Path to Scalable Real-Time Analytics
Neo4j – The Fastest Path to Scalable Real-Time Analytics
 
Lecture 6: Infrastructure & Tooling (Full Stack Deep Learning - Spring 2021)
Lecture 6: Infrastructure & Tooling (Full Stack Deep Learning - Spring 2021)Lecture 6: Infrastructure & Tooling (Full Stack Deep Learning - Spring 2021)
Lecture 6: Infrastructure & Tooling (Full Stack Deep Learning - Spring 2021)
 

Similar to Deep learning at the edge: 100x Inference improvement on edge devices

Deploying Pretrained Model In Edge IoT Devices.pdf
Deploying Pretrained Model In Edge IoT Devices.pdfDeploying Pretrained Model In Edge IoT Devices.pdf
Deploying Pretrained Model In Edge IoT Devices.pdfObject Automation
 
AI On the Edge: Model Compression
AI On the Edge: Model CompressionAI On the Edge: Model Compression
AI On the Edge: Model CompressionApache MXNet
 
FPGA Hardware Accelerator for Machine Learning
FPGA Hardware Accelerator for Machine Learning FPGA Hardware Accelerator for Machine Learning
FPGA Hardware Accelerator for Machine Learning Dr. Swaminathan Kathirvel
 
Deep Learning on Everyday Devices
Deep Learning on Everyday DevicesDeep Learning on Everyday Devices
Deep Learning on Everyday DevicesBrodmann17
 
FPGA Conference 2021: Breaking the TOPS ceiling with sparse neural networks -...
FPGA Conference 2021: Breaking the TOPS ceiling with sparse neural networks -...FPGA Conference 2021: Breaking the TOPS ceiling with sparse neural networks -...
FPGA Conference 2021: Breaking the TOPS ceiling with sparse neural networks -...Numenta
 
Give Your Organization Better, Faster Insights & Answers with High Performanc...
Give Your Organization Better, Faster Insights & Answers with High Performanc...Give Your Organization Better, Faster Insights & Answers with High Performanc...
Give Your Organization Better, Faster Insights & Answers with High Performanc...Dell World
 
Fundamentals.pptx
Fundamentals.pptxFundamentals.pptx
Fundamentals.pptxdhivyak49
 
Electi Deep Learning Optimization
Electi  Deep Learning OptimizationElecti  Deep Learning Optimization
Electi Deep Learning OptimizationNikolas Markou
 
Retraining Quantized Neural Network Models with Unlabeled Data.pdf
Retraining Quantized Neural Network Models with Unlabeled Data.pdfRetraining Quantized Neural Network Models with Unlabeled Data.pdf
Retraining Quantized Neural Network Models with Unlabeled Data.pdfKundjanasith Thonglek
 
Large Model support and Distribute deep learning
Large Model support and Distribute deep learningLarge Model support and Distribute deep learning
Large Model support and Distribute deep learningGanesan Narayanasamy
 
GTC15-Manoj-Roge-OpenPOWER
GTC15-Manoj-Roge-OpenPOWERGTC15-Manoj-Roge-OpenPOWER
GTC15-Manoj-Roge-OpenPOWERAchronix
 
Ibm pure data system for analytics n200x
Ibm pure data system for analytics n200xIbm pure data system for analytics n200x
Ibm pure data system for analytics n200xIBM Sverige
 
“Using a Neural Processor for Always-sensing Cameras,” a Presentation from Ex...
“Using a Neural Processor for Always-sensing Cameras,” a Presentation from Ex...“Using a Neural Processor for Always-sensing Cameras,” a Presentation from Ex...
“Using a Neural Processor for Always-sensing Cameras,” a Presentation from Ex...Edge AI and Vision Alliance
 
APSys Presentation Final copy2
APSys Presentation Final copy2APSys Presentation Final copy2
APSys Presentation Final copy2Junli Gu
 
Quad Core Processors - Technology Presentation
Quad Core Processors - Technology PresentationQuad Core Processors - Technology Presentation
Quad Core Processors - Technology Presentationvinaya.hs
 
Live Data: For When Data is Greater than Memory
Live Data: For When Data is Greater than MemoryLive Data: For When Data is Greater than Memory
Live Data: For When Data is Greater than MemoryMemVerge
 
Key Note Session IDUG DB2 Seminar, 16th April London - Julian Stuhler .Trito...
Key Note Session  IDUG DB2 Seminar, 16th April London - Julian Stuhler .Trito...Key Note Session  IDUG DB2 Seminar, 16th April London - Julian Stuhler .Trito...
Key Note Session IDUG DB2 Seminar, 16th April London - Julian Stuhler .Trito...Surekha Parekh
 

Similar to Deep learning at the edge: 100x Inference improvement on edge devices (20)

Deploying Pretrained Model In Edge IoT Devices.pdf
Deploying Pretrained Model In Edge IoT Devices.pdfDeploying Pretrained Model In Edge IoT Devices.pdf
Deploying Pretrained Model In Edge IoT Devices.pdf
 
AI On the Edge: Model Compression
AI On the Edge: Model CompressionAI On the Edge: Model Compression
AI On the Edge: Model Compression
 
FPGA Hardware Accelerator for Machine Learning
FPGA Hardware Accelerator for Machine Learning FPGA Hardware Accelerator for Machine Learning
FPGA Hardware Accelerator for Machine Learning
 
Deep Learning on Everyday Devices
Deep Learning on Everyday DevicesDeep Learning on Everyday Devices
Deep Learning on Everyday Devices
 
FPGA Conference 2021: Breaking the TOPS ceiling with sparse neural networks -...
FPGA Conference 2021: Breaking the TOPS ceiling with sparse neural networks -...FPGA Conference 2021: Breaking the TOPS ceiling with sparse neural networks -...
FPGA Conference 2021: Breaking the TOPS ceiling with sparse neural networks -...
 
Give Your Organization Better, Faster Insights & Answers with High Performanc...
Give Your Organization Better, Faster Insights & Answers with High Performanc...Give Your Organization Better, Faster Insights & Answers with High Performanc...
Give Your Organization Better, Faster Insights & Answers with High Performanc...
 
Open power ddl and lms
Open power ddl and lmsOpen power ddl and lms
Open power ddl and lms
 
Fundamentals.pptx
Fundamentals.pptxFundamentals.pptx
Fundamentals.pptx
 
Electi Deep Learning Optimization
Electi  Deep Learning OptimizationElecti  Deep Learning Optimization
Electi Deep Learning Optimization
 
Retraining Quantized Neural Network Models with Unlabeled Data.pdf
Retraining Quantized Neural Network Models with Unlabeled Data.pdfRetraining Quantized Neural Network Models with Unlabeled Data.pdf
Retraining Quantized Neural Network Models with Unlabeled Data.pdf
 
Large Model support and Distribute deep learning
Large Model support and Distribute deep learningLarge Model support and Distribute deep learning
Large Model support and Distribute deep learning
 
GTC15-Manoj-Roge-OpenPOWER
GTC15-Manoj-Roge-OpenPOWERGTC15-Manoj-Roge-OpenPOWER
GTC15-Manoj-Roge-OpenPOWER
 
Ibm pure data system for analytics n200x
Ibm pure data system for analytics n200xIbm pure data system for analytics n200x
Ibm pure data system for analytics n200x
 
“Using a Neural Processor for Always-sensing Cameras,” a Presentation from Ex...
“Using a Neural Processor for Always-sensing Cameras,” a Presentation from Ex...“Using a Neural Processor for Always-sensing Cameras,” a Presentation from Ex...
“Using a Neural Processor for Always-sensing Cameras,” a Presentation from Ex...
 
APSys Presentation Final copy2
APSys Presentation Final copy2APSys Presentation Final copy2
APSys Presentation Final copy2
 
Quad Core Processors - Technology Presentation
Quad Core Processors - Technology PresentationQuad Core Processors - Technology Presentation
Quad Core Processors - Technology Presentation
 
Deep learning with FPGA
Deep learning with FPGADeep learning with FPGA
Deep learning with FPGA
 
Hipeac 2018 keynote Talk
Hipeac 2018 keynote TalkHipeac 2018 keynote Talk
Hipeac 2018 keynote Talk
 
Live Data: For When Data is Greater than Memory
Live Data: For When Data is Greater than MemoryLive Data: For When Data is Greater than Memory
Live Data: For When Data is Greater than Memory
 
Key Note Session IDUG DB2 Seminar, 16th April London - Julian Stuhler .Trito...
Key Note Session  IDUG DB2 Seminar, 16th April London - Julian Stuhler .Trito...Key Note Session  IDUG DB2 Seminar, 16th April London - Julian Stuhler .Trito...
Key Note Session IDUG DB2 Seminar, 16th April London - Julian Stuhler .Trito...
 

More from Numenta

Brains@Bay Meetup: A Primer on Neuromodulatory Systems - Srikanth Ramaswamy
Brains@Bay Meetup: A Primer on Neuromodulatory Systems - Srikanth RamaswamyBrains@Bay Meetup: A Primer on Neuromodulatory Systems - Srikanth Ramaswamy
Brains@Bay Meetup: A Primer on Neuromodulatory Systems - Srikanth RamaswamyNumenta
 
Brains@Bay Meetup: How to Evolve Your Own Lab Rat - Thomas Miconi
Brains@Bay Meetup: How to Evolve Your Own Lab Rat - Thomas MiconiBrains@Bay Meetup: How to Evolve Your Own Lab Rat - Thomas Miconi
Brains@Bay Meetup: How to Evolve Your Own Lab Rat - Thomas MiconiNumenta
 
Brains@Bay Meetup: The Increasing Role of Sensorimotor Experience in Artifici...
Brains@Bay Meetup: The Increasing Role of Sensorimotor Experience in Artifici...Brains@Bay Meetup: The Increasing Role of Sensorimotor Experience in Artifici...
Brains@Bay Meetup: The Increasing Role of Sensorimotor Experience in Artifici...Numenta
 
Brains@Bay Meetup: Open-ended Skill Acquisition in Humans and Machines: An Ev...
Brains@Bay Meetup: Open-ended Skill Acquisition in Humans and Machines: An Ev...Brains@Bay Meetup: Open-ended Skill Acquisition in Humans and Machines: An Ev...
Brains@Bay Meetup: Open-ended Skill Acquisition in Humans and Machines: An Ev...Numenta
 
Brains@Bay Meetup: The Effect of Sensorimotor Learning on the Learned Represe...
Brains@Bay Meetup: The Effect of Sensorimotor Learning on the Learned Represe...Brains@Bay Meetup: The Effect of Sensorimotor Learning on the Learned Represe...
Brains@Bay Meetup: The Effect of Sensorimotor Learning on the Learned Represe...Numenta
 
SBMT 2021: Can Neuroscience Insights Transform AI? - Lawrence Spracklen
SBMT 2021: Can Neuroscience Insights Transform AI? - Lawrence SpracklenSBMT 2021: Can Neuroscience Insights Transform AI? - Lawrence Spracklen
SBMT 2021: Can Neuroscience Insights Transform AI? - Lawrence SpracklenNumenta
 
BAAI Conference 2021: The Thousand Brains Theory - A Roadmap for Creating Mac...
BAAI Conference 2021: The Thousand Brains Theory - A Roadmap for Creating Mac...BAAI Conference 2021: The Thousand Brains Theory - A Roadmap for Creating Mac...
BAAI Conference 2021: The Thousand Brains Theory - A Roadmap for Creating Mac...Numenta
 
Jeff Hawkins NAISys 2020: How the Brain Uses Reference Frames, Why AI Needs t...
Jeff Hawkins NAISys 2020: How the Brain Uses Reference Frames, Why AI Needs t...Jeff Hawkins NAISys 2020: How the Brain Uses Reference Frames, Why AI Needs t...
Jeff Hawkins NAISys 2020: How the Brain Uses Reference Frames, Why AI Needs t...Numenta
 
OpenAI’s GPT 3 Language Model - guest Steve Omohundro
OpenAI’s GPT 3 Language Model - guest Steve OmohundroOpenAI’s GPT 3 Language Model - guest Steve Omohundro
OpenAI’s GPT 3 Language Model - guest Steve OmohundroNumenta
 
CVPR 2020 Workshop: Sparsity in the neocortex, and its implications for conti...
CVPR 2020 Workshop: Sparsity in the neocortex, and its implications for conti...CVPR 2020 Workshop: Sparsity in the neocortex, and its implications for conti...
CVPR 2020 Workshop: Sparsity in the neocortex, and its implications for conti...Numenta
 
Sparsity In The Neocortex, And Its Implications For Machine Learning
Sparsity In The Neocortex,  And Its Implications For Machine LearningSparsity In The Neocortex,  And Its Implications For Machine Learning
Sparsity In The Neocortex, And Its Implications For Machine LearningNumenta
 
The Thousand Brains Theory: A Framework for Understanding the Neocortex and B...
The Thousand Brains Theory: A Framework for Understanding the Neocortex and B...The Thousand Brains Theory: A Framework for Understanding the Neocortex and B...
The Thousand Brains Theory: A Framework for Understanding the Neocortex and B...Numenta
 
Jeff Hawkins Human Brain Project Summit Keynote: "Location, Location, Locatio...
Jeff Hawkins Human Brain Project Summit Keynote: "Location, Location, Locatio...Jeff Hawkins Human Brain Project Summit Keynote: "Location, Location, Locatio...
Jeff Hawkins Human Brain Project Summit Keynote: "Location, Location, Locatio...Numenta
 
Location, Location, Location - A Framework for Intelligence and Cortical Comp...
Location, Location, Location - A Framework for Intelligence and Cortical Comp...Location, Location, Location - A Framework for Intelligence and Cortical Comp...
Location, Location, Location - A Framework for Intelligence and Cortical Comp...Numenta
 
Have We Missed Half of What the Neocortex Does? A New Predictive Framework ...
 Have We Missed Half of What the Neocortex Does?  A New Predictive Framework ... Have We Missed Half of What the Neocortex Does?  A New Predictive Framework ...
Have We Missed Half of What the Neocortex Does? A New Predictive Framework ...Numenta
 
Locations in the Neocortex: A Theory of Sensorimotor Prediction Using Cortica...
Locations in the Neocortex: A Theory of Sensorimotor Prediction Using Cortica...Locations in the Neocortex: A Theory of Sensorimotor Prediction Using Cortica...
Locations in the Neocortex: A Theory of Sensorimotor Prediction Using Cortica...Numenta
 
The Predictive Neuron: How Active Dendrites Enable Spatiotemporal Computation...
The Predictive Neuron: How Active Dendrites Enable Spatiotemporal Computation...The Predictive Neuron: How Active Dendrites Enable Spatiotemporal Computation...
The Predictive Neuron: How Active Dendrites Enable Spatiotemporal Computation...Numenta
 
The Biological Path Toward Strong AI by Matt Taylor (05/17/18)
The Biological Path Toward Strong AI by Matt Taylor (05/17/18)The Biological Path Toward Strong AI by Matt Taylor (05/17/18)
The Biological Path Toward Strong AI by Matt Taylor (05/17/18)Numenta
 
Does the neocortex use grid cell-like mechanisms to learn the structure of ob...
Does the neocortex use grid cell-like mechanisms to learn the structure of ob...Does the neocortex use grid cell-like mechanisms to learn the structure of ob...
Does the neocortex use grid cell-like mechanisms to learn the structure of ob...Numenta
 
Could A Model Of Predictive Voting Explain Many Long-Range Connections? by Su...
Could A Model Of Predictive Voting Explain Many Long-Range Connections? by Su...Could A Model Of Predictive Voting Explain Many Long-Range Connections? by Su...
Could A Model Of Predictive Voting Explain Many Long-Range Connections? by Su...Numenta
 

More from Numenta (20)

Brains@Bay Meetup: A Primer on Neuromodulatory Systems - Srikanth Ramaswamy
Brains@Bay Meetup: A Primer on Neuromodulatory Systems - Srikanth RamaswamyBrains@Bay Meetup: A Primer on Neuromodulatory Systems - Srikanth Ramaswamy
Brains@Bay Meetup: A Primer on Neuromodulatory Systems - Srikanth Ramaswamy
 
Brains@Bay Meetup: How to Evolve Your Own Lab Rat - Thomas Miconi
Brains@Bay Meetup: How to Evolve Your Own Lab Rat - Thomas MiconiBrains@Bay Meetup: How to Evolve Your Own Lab Rat - Thomas Miconi
Brains@Bay Meetup: How to Evolve Your Own Lab Rat - Thomas Miconi
 
Brains@Bay Meetup: The Increasing Role of Sensorimotor Experience in Artifici...
Brains@Bay Meetup: The Increasing Role of Sensorimotor Experience in Artifici...Brains@Bay Meetup: The Increasing Role of Sensorimotor Experience in Artifici...
Brains@Bay Meetup: The Increasing Role of Sensorimotor Experience in Artifici...
 
Brains@Bay Meetup: Open-ended Skill Acquisition in Humans and Machines: An Ev...
Brains@Bay Meetup: Open-ended Skill Acquisition in Humans and Machines: An Ev...Brains@Bay Meetup: Open-ended Skill Acquisition in Humans and Machines: An Ev...
Brains@Bay Meetup: Open-ended Skill Acquisition in Humans and Machines: An Ev...
 
Brains@Bay Meetup: The Effect of Sensorimotor Learning on the Learned Represe...
Brains@Bay Meetup: The Effect of Sensorimotor Learning on the Learned Represe...Brains@Bay Meetup: The Effect of Sensorimotor Learning on the Learned Represe...
Brains@Bay Meetup: The Effect of Sensorimotor Learning on the Learned Represe...
 
SBMT 2021: Can Neuroscience Insights Transform AI? - Lawrence Spracklen
SBMT 2021: Can Neuroscience Insights Transform AI? - Lawrence SpracklenSBMT 2021: Can Neuroscience Insights Transform AI? - Lawrence Spracklen
SBMT 2021: Can Neuroscience Insights Transform AI? - Lawrence Spracklen
 
BAAI Conference 2021: The Thousand Brains Theory - A Roadmap for Creating Mac...
BAAI Conference 2021: The Thousand Brains Theory - A Roadmap for Creating Mac...BAAI Conference 2021: The Thousand Brains Theory - A Roadmap for Creating Mac...
BAAI Conference 2021: The Thousand Brains Theory - A Roadmap for Creating Mac...
 
Jeff Hawkins NAISys 2020: How the Brain Uses Reference Frames, Why AI Needs t...
Jeff Hawkins NAISys 2020: How the Brain Uses Reference Frames, Why AI Needs t...Jeff Hawkins NAISys 2020: How the Brain Uses Reference Frames, Why AI Needs t...
Jeff Hawkins NAISys 2020: How the Brain Uses Reference Frames, Why AI Needs t...
 
OpenAI’s GPT 3 Language Model - guest Steve Omohundro
OpenAI’s GPT 3 Language Model - guest Steve OmohundroOpenAI’s GPT 3 Language Model - guest Steve Omohundro
OpenAI’s GPT 3 Language Model - guest Steve Omohundro
 
CVPR 2020 Workshop: Sparsity in the neocortex, and its implications for conti...
CVPR 2020 Workshop: Sparsity in the neocortex, and its implications for conti...CVPR 2020 Workshop: Sparsity in the neocortex, and its implications for conti...
CVPR 2020 Workshop: Sparsity in the neocortex, and its implications for conti...
 
Sparsity In The Neocortex, And Its Implications For Machine Learning
Sparsity In The Neocortex,  And Its Implications For Machine LearningSparsity In The Neocortex,  And Its Implications For Machine Learning
Sparsity In The Neocortex, And Its Implications For Machine Learning
 
The Thousand Brains Theory: A Framework for Understanding the Neocortex and B...
The Thousand Brains Theory: A Framework for Understanding the Neocortex and B...The Thousand Brains Theory: A Framework for Understanding the Neocortex and B...
The Thousand Brains Theory: A Framework for Understanding the Neocortex and B...
 
Jeff Hawkins Human Brain Project Summit Keynote: "Location, Location, Locatio...
Jeff Hawkins Human Brain Project Summit Keynote: "Location, Location, Locatio...Jeff Hawkins Human Brain Project Summit Keynote: "Location, Location, Locatio...
Jeff Hawkins Human Brain Project Summit Keynote: "Location, Location, Locatio...
 
Location, Location, Location - A Framework for Intelligence and Cortical Comp...
Location, Location, Location - A Framework for Intelligence and Cortical Comp...Location, Location, Location - A Framework for Intelligence and Cortical Comp...
Location, Location, Location - A Framework for Intelligence and Cortical Comp...
 
Have We Missed Half of What the Neocortex Does? A New Predictive Framework ...
 Have We Missed Half of What the Neocortex Does?  A New Predictive Framework ... Have We Missed Half of What the Neocortex Does?  A New Predictive Framework ...
Have We Missed Half of What the Neocortex Does? A New Predictive Framework ...
 
Locations in the Neocortex: A Theory of Sensorimotor Prediction Using Cortica...
Locations in the Neocortex: A Theory of Sensorimotor Prediction Using Cortica...Locations in the Neocortex: A Theory of Sensorimotor Prediction Using Cortica...
Locations in the Neocortex: A Theory of Sensorimotor Prediction Using Cortica...
 
The Predictive Neuron: How Active Dendrites Enable Spatiotemporal Computation...
The Predictive Neuron: How Active Dendrites Enable Spatiotemporal Computation...The Predictive Neuron: How Active Dendrites Enable Spatiotemporal Computation...
The Predictive Neuron: How Active Dendrites Enable Spatiotemporal Computation...
 
The Biological Path Toward Strong AI by Matt Taylor (05/17/18)
The Biological Path Toward Strong AI by Matt Taylor (05/17/18)The Biological Path Toward Strong AI by Matt Taylor (05/17/18)
The Biological Path Toward Strong AI by Matt Taylor (05/17/18)
 
Does the neocortex use grid cell-like mechanisms to learn the structure of ob...
Does the neocortex use grid cell-like mechanisms to learn the structure of ob...Does the neocortex use grid cell-like mechanisms to learn the structure of ob...
Does the neocortex use grid cell-like mechanisms to learn the structure of ob...
 
Could A Model Of Predictive Voting Explain Many Long-Range Connections? by Su...
Could A Model Of Predictive Voting Explain Many Long-Range Connections? by Su...Could A Model Of Predictive Voting Explain Many Long-Range Connections? by Su...
Could A Model Of Predictive Voting Explain Many Long-Range Connections? by Su...
 

Recently uploaded

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAnitaRaj43
 
Decarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational PerformanceDecarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational PerformanceIES VE
 
Quantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation ComputingQuantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation ComputingWSO2
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Choreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software EngineeringChoreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software EngineeringWSO2
 
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....rightmanforbloodline
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data PlatformLess Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data PlatformWSO2
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Bhuvaneswari Subramani
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Simplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxSimplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxMarkSteadman7
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
JavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuideJavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuidePixlogix Infotech
 
API Governance and Monetization - The evolution of API governance
API Governance and Monetization -  The evolution of API governanceAPI Governance and Monetization -  The evolution of API governance
API Governance and Monetization - The evolution of API governanceWSO2
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 

Recently uploaded (20)

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 
Decarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational PerformanceDecarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational Performance
 
Quantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation ComputingQuantum Leap in Next-Generation Computing
Quantum Leap in Next-Generation Computing
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Choreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software EngineeringChoreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software Engineering
 
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data PlatformLess Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Simplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxSimplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptx
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
JavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuideJavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate Guide
 
API Governance and Monetization - The evolution of API governance
API Governance and Monetization -  The evolution of API governanceAPI Governance and Monetization -  The evolution of API governance
API Governance and Monetization - The evolution of API governance
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 

Deep learning at the edge: 100x Inference improvement on edge devices

  • 1. Deep learning at the edge: 100x Inference improvement on edge devices Dr. Lawrence Spracklen Director of ML Architecture
  • 2. AI goes big! ● Deep learning dominates today’s AI conversations ● Delivering significant accuracy improvements ● Enabled by explosive growth in model sizes ○ Initially in NLP, but increasingly vision and beyond ● Spiraling training and inference costs ● Unsustainable energy utilization 17,000X increase Source: Microsoft Research blog
  • 3. A brute force solution Perform matrix multiplications very fast ● GPUs have become AI workhorses ○ 500+ trillion arithmetic operations per second per card ● Hardware performance doubles every few years ● Exploding AI costs ○ 2018 : BERT cost $6K+ to train ○ 2020 : GPT-3 cost $10M+ to train ● Hardware failing to keep pace with growth in model size Source: NVIDIA
  • 4. Design new hardware Edge systems are constrained systems ● Problems compounded at the edge ● Limited memory and computation resources ● Makes large SOTA models on edge devices impractical ● Efficiency step-change needed to drive change at the edge Necessary to address either the software or the hardware Improve model efficiency on existing hardware Source: NVIDIA blog
  • 5. Natural Intelligence Examine the human brain • Neuron interconnections are sparse • Neuron activations are sparse • Neurons are low-fidelity • Neurons are significantly more complex than AI’s point neuron abstraction • Humans can learn from very limited examples
  • 6. Sparse models • Weight sparsity zeros the majority of a model’s neuron weights • Activation sparsity dynamically activates context dependent sub-networks • Enables significant computational savings • Benefits are multiplicative • Delivering 100X+ reductions in computational costs for identical model architectures Input#1 Input#2
  • 7. You can control your models • Industry has struggled to accelerate sparse models on general-purpose hardware • 2-3X acceleration from 20X reduction in non-zero parameters • However, it possible to precisely control weight and activation placement to ensure hardware sympathetic patterns • Achievable with both directly trained sparse models and ‘pruned’ sparse models • Enables speedups that directly mirror the reduction in non-zeros • Achievable on existing hardware (FPGAs, CPUs and GPUs) • 95% weight sparsity delivers 20X acceleration
  • 8. ConvNets on FPGAs • Optimized model inference outperforms the traditional model by 112X • Remember – same accuracy as the original dense model • Corresponding improvements in per-operation energy efficiency • Optimized models fit on ZU3EG edge devices • Edge device outperforms standard models running on the datacenter-class U250! FPGA platform Power consumption Network type Number of networks on chip Full chip throughput (words/sec) Full chip speedup Relative efficiency Alveo U250 225W Dense 4 12,195 - 100% Sparse-Dense 24 689,655 56.5 5675% Sparse-Sparse 20 1,369,863 112.3 11274% ZU3EG 24W Dense 0 0 - - Sparse-Dense 1 21,053 Infinite 1624% Sparse-Sparse 1 45,455 Infinite 3505%
  • 9. Transformers on CPUs and GPUs ● Techniques are compatible with large language and vision models ● These optimized models run extremely efficiently on CPUs and GPUs ● 10-30X throughput improvement for typical models ○ Identical model architecture and accuracy ● Accelerates both training and inference ● Significant reduction in memory footprint Unleash deep learning at the edge
  • 10. Low latency opportunity ● Long latencies prevent large model use in many application spaces ○ Small transformer CPU inference latencies can approach 100ms ● Sparse models also deliver improved latency ● Inference latency can be decreased by 10X+ ● Enables models to be used with conversational apps ○ Sub-3ms latency for BERTbase on CPUs
  • 11. Pick your path Create optimized model Accelerated CPU Inference Accelerated GPU Inference New Model Existing Model Accelerated training on GPUs TRAIN INFERENCE Pretrained Model Accelerated fine- tuning on CPUs 5-20X performance Improvements in training 10-30X performance Improvements in inference Accelerated CPU Inference Accelerated GPU Inference 5-15X performance Improvements in inference
  • 12. Example Applications for these Optimized Models Voice recognition on embedded devices Conversational AI – real- time responsiveness Computer vision at the edge Forecasting early disorders on wearables
  • 13. Taking it further • Sparsity represents just the start of what’s possible • Quantization • Leverage reduced fidelity representations • Further reduces memory footprint • Translates into improved performance on most hardware [2.5X+] • Knowledge distillation • Reduce number/size of layers & use large teacher during training to maximize accuracy • Fewer layers directly reduces inference costs [5X+] • Benefits are multiplicative… Deep learning models can run 100X faster on existing edge devices
  • 14. In Summary ● Edge systems are constrained as explosive AI growth continues ● We can leverage structures and efficiencies from the brain to create hardware-aware, highly performant sparse models ● Optimized models unlock possibilities for deep learning at the edge: ○ Outperform traditional models by 10-100X+ ○ Reduced memory footprint ○ Sub 3ms inference latency for large language models ○ Can run where traditional models don’t fit
  • 15. Interested in partnering with us? Contact me at lspracklen@numenta.com or visit numenta.com. THANK YOU QUESTIONS?