"Implementing the TensorFlow Deep Learning Framework on Qualcomm’s Low-power DSP," a Presentation from Google

•

3 likes•1,488 views

TensorFlow has been implemented on Qualcomm's low-power DSP to run neural networks for machine learning. Models run 8 times faster and use 1.4 watts of power on the Snapdragon 820 processor's Hexagon DSP compared to 5 watts on a CPU. Qualcomm tested gemmlowp on the HVX and achieved over 5 times the speed and much lower power usage than a CPU. End to end performance on InceptionV1 was around 90 ms using the HVX versus 700 ms on a CPU.

Technology

Copyright © 2017 Google 1
Implementing the TensorFlow
Deep Learning Framework on
Qualcomm’s Low-power DSP
Pete Warden
May 2017

Copyright © 2017 Google 2
• Google’s open source library for machine intelligence
• tensorflow.org launched in Nov 2015
• Used by many production ML projects
2

Copyright © 2017 Google 3
TensorFlow and HVX

Copyright © 2017 Google 4
• Models run 8X faster, and use 1.4 watts versus ~5 watts on CPU
TensorFlow supports Qualcomm’s Hexagon DSP
Qualcomm Snapdragon 820 Processor
featuring the Hexagon DSP
DragonBoard 820c

Copyright © 2017 Google 5
• Started with my Embedded Vision Alliance talk last year
• “Eight bits are enough”
• Became clear from conversations with Qualcomm that there were
possibilities with their existing hardware in the Snapdragon 820
How did this happen?

Copyright © 2017 Google 6
• Qualcomm implemented a quick sanity test using gemmlowp, our
open source math library
• That demonstrated 100 GOPs/second on realistic workloads using
the HVX
• More than 5x speed of CPU
• Power usage expected to be much lower
Next Steps

Copyright © 2017 Google 7
• Gemmlowp project has m, n, k values for InceptionV1 matrix multiplies
Benchmark Details
https://github.com/google/gemmlowp/blob/master/test/benchmark.cc#L283

Copyright © 2017 Google 8
• Gemmlowp results indicated around 200 GOPs/s
(versus 25 GOPs/s on CPU)
• End to end turned out to be around 90 ms, versus 700 ms on CPU
• Was a good predictor of performance
Results
8

Copyright © 2017 Google 9
TensorFlow code is at
https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/hvx
Qualcomm code is
https://source.codeaurora.org/quic/hexagon_nn/nnlib

Copyright © 2017 Google 10
• Works by assembling a batch of ops on the CPU
• Then sends them off to HVX via FastRPC
• HVX runs it within its own code loop
• Signals the AP when it's done

Copyright © 2017 Google 11
• TensorFlow handles splitting up the graph between HVX and CPU
• Same mechanism is available for other accelerators too

Copyright © 2017 Google 15
Embedded TensorFlow

Copyright © 2017 Google 16
We work closely with chip builders

Copyright © 2017 Google 17
• Examples
• ARM’s Compute Library
• Movidius’s mvTensor tool
• CEVA’s conversion tools
• Intel’s contributions to https://github.com/google/gemmlowp
• Qualcomm’s HVX collaboration
We work closely with chip builders

Copyright © 2017 Google 19
• Mobile App Developers (including Snapchat)
• Device builders
• Home
• Drones
• Industrial
• Medical
• Automotive
Lots of demand

Copyright © 2017 Google 20
• Full support for eight bit
• Full stack: researchers, data centers, mobile apps, embedded devices
• Main framework at Google
• Shipping for vision on many apps, including PhotoScan and Snapchat
What’s TensorFlow particularly good at?

Copyright © 2017 Google 21
• Support for eight-bit training
• On-device training (already being used by Google Keyboard)
• Better export pipeline (Graph Transform Tool)
• Raspberry Pi
• Jetson TX1 experimental support
• Other chips?
• Many more examples
Embedded TensorFlow Roadmap

Copyright © 2017 Google 22
• ARM and Intel added code to https://github.com/google/gemmlowp
• Worked with many others to support TensorFlow file format for conversion
pipelines
• We’re always open to conversations about our requirements and porting
Collaborations with hardware vendors

Copyright © 2017 Google 23
• TensorFlow hands-on training class from the Embedded Vision Alliance,
July 13 in Santa Clara
• We’re always looking for chips, tools, systems companies to collaborate
with
• Please get in touch!
• petewarden@google.com
Future

What's hot

"Developing Real-time Video Applications with CoaXPress," A Presentation from...Edge AI and Vision Alliance

Webinar: Three Reasons Why NAS is No Good for AI and Machine LearningStorage Switzerland

“TinyML Isn’t Thinking Big Enough,” a Presentation from PerceiveEdge AI and Vision Alliance

TensorFlow London: Cutting edge generative modelsSeldon

Metaflow: The ML Infrastructure at NetflixBill Liu

Intro to DeepLearning4J on ApacheSpark SDS DL Workshop 16Romeo Kienzler

WekaIO: Making Machine Learning Compute Bound Againinside-BigData.com

Very large scale distributed deep learning on BigDLDESMOND YUEN

High performance computing for researchEsteban Hernandez

Welcome to the 2018 Stanford HPC Conferenceinside-BigData.com

Distributed Models Over Distributed Data with MLflow, Pyspark, and PandasDatabricks

Self driving computers active learning workflows with human interpretable ve...Adam Gibson

The Pandemic Changes Everything, the Need for Speed and ResiliencyAlluxio, Inc.

Video Analytics on Hadoop webinar victor fang-201309DrVictorFang

Industrial production process visualization with the Elastic Stack in real-ti...Elasticsearch

IBM Middle East Data Science Connect 2016 - Doha, QatarRomeo Kienzler

Cloud Computing Was Built for Web Developers—What Does v2 Look Like for Deep...Databricks

Data Tells the Story - Greenplum Summit 2018VMware Tanzu

Apache SystemML - Declarative Large-Scale Machine LearningRomeo Kienzler

The Power of DataOps for Cloud and Digital Transformation Delphix

What's hot (20)

"Developing Real-time Video Applications with CoaXPress," A Presentation from...

Webinar: Three Reasons Why NAS is No Good for AI and Machine Learning

“TinyML Isn’t Thinking Big Enough,” a Presentation from Perceive

TensorFlow London: Cutting edge generative models

Metaflow: The ML Infrastructure at Netflix

Intro to DeepLearning4J on ApacheSpark SDS DL Workshop 16

WekaIO: Making Machine Learning Compute Bound Again

Very large scale distributed deep learning on BigDL

High performance computing for research

Welcome to the 2018 Stanford HPC Conference

Distributed Models Over Distributed Data with MLflow, Pyspark, and Pandas

Self driving computers active learning workflows with human interpretable ve...

The Pandemic Changes Everything, the Need for Speed and Resiliency

Video Analytics on Hadoop webinar victor fang-201309

Industrial production process visualization with the Elastic Stack in real-ti...

IBM Middle East Data Science Connect 2016 - Doha, Qatar

Cloud Computing Was Built for Web Developers—What Does v2 Look Like for Deep...

Data Tells the Story - Greenplum Summit 2018

Apache SystemML - Declarative Large-Scale Machine Learning

The Power of DataOps for Cloud and Digital Transformation

Similar to "Implementing the TensorFlow Deep Learning Framework on Qualcomm’s Low-power DSP," a Presentation from Google

DevOpsDays 2018 - Migrating a Cloud Native App to k8sGökhan Şengün

Jenkins Pipeline @ Scale. Building Automation Frameworks for Systems IntegrationOleg Nenashev

How bigtop leveraged docker for build automation and one click hadoop provis...Evans Ye

Google Tech Talk with Dr. Eric Brewer in Korea Apr.27.2015Chris Jang

AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)Amazon Web Services

Netflix Open Source: Building a Distributed and Automated Open Source Programaspyker

Building a Distributed & Automated Open Source Program at NetflixAll Things Open

Hadoop training in mumbaifaizrashid1995

LNUG: Having Your Node.js Cake and Eating It TooRob Tweed

Machine Learning StandardsThierry Janssens

PortfolioShrey Sangal

Google does containers: Hello Kubernetes - Steve Wong and Vladimir Vivien - D...{code} by Dell EMC

New DevOps for the DBAKellyn Pot'Vin-Gorman

Storage for containers and cloud-native deployments - Rancher Online Meetup -...Shannon Williams

Continuous delivery with jenkins pipelines (@WeAreDevelopers2017)Roman Pickl

Distributed tensorflow on kubernetesinwin stack

Rootconf 2017 - State of the Open Source monitoring landscape NETWAYS

Webinar: End-to-End CI/CD with GitLab and DC/OSMesosphere Inc.

vBACD - Distributed Petabyte-Scale Cloud Storage with GlusterFS - 2/28CloudStack - Open Source Cloud Computing Project

Similar to "Implementing the TensorFlow Deep Learning Framework on Qualcomm’s Low-power DSP," a Presentation from Google (20)

DevOpsDays 2018 - Migrating a Cloud Native App to k8s

Jenkins Pipeline @ Scale. Building Automation Frameworks for Systems Integration

How bigtop leveraged docker for build automation and one click hadoop provis...

Google Tech Talk with Dr. Eric Brewer in Korea Apr.27.2015

AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)

Netflix Open Source: Building a Distributed and Automated Open Source Program

Building a Distributed & Automated Open Source Program at Netflix

Hadoop training in mumbai

LNUG: Having Your Node.js Cake and Eating It Too

Machine Learning Standards

Portfolio

Google does containers: Hello Kubernetes - Steve Wong and Vladimir Vivien - D...

New DevOps for the DBA

Storage for containers and cloud-native deployments - Rancher Online Meetup -...

Continuous delivery with jenkins pipelines (@WeAreDevelopers2017)

Distributed tensorflow on kubernetes

Rootconf 2017 - State of the Open Source monitoring landscape

Webinar: End-to-End CI/CD with GitLab and DC/OS

vBACD - Distributed Petabyte-Scale Cloud Storage with GlusterFS - 2/28

Recently uploaded

FWD Group - Insurer Innovation Award 2024The Digital Insurer

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93

Why Teams call analytics are critical to your entire businesspanagenda

Corporate and higher education May webinar.pptxRustici Software

Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays

AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer

Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2

2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous

MS Copilot expands with MS Graph connectorsNanddeep Nachan

MINDCTI Revenue Release Quarter One 2024MIND CTI

ICT role in 21st century education and its challengesrafiqahmad00786416

Cyberprint. Dark Pink Apt Group [EN].pdfOverkill Security

TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc

Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney

Architecting Cloud Native ApplicationsWSO2

Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays

Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays

Recently uploaded (20)

FWD Group - Insurer Innovation Award 2024

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff

Why Teams call analytics are critical to your entire business

Corporate and higher education May webinar.pptx

Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...

AXA XL - Insurer Innovation Award Americas 2024

Exploring the Future Potential of AI-Enabled Smartphone Processors

2024: Domino Containers - The Next Step. News from the Domino Container commu...

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke

MS Copilot expands with MS Graph connectors

MINDCTI Revenue Release Quarter One 2024

ICT role in 21st century education and its challenges

Cyberprint. Dark Pink Apt Group [EN].pdf

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery

Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...

Architecting Cloud Native Applications

Boost Fertility New Invention Ups Success Rates.pdf

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...

Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...

"Implementing the TensorFlow Deep Learning Framework on Qualcomm’s Low-power DSP," a Presentation from Google

4. Copyright © 2017 Google 4 • Models run 8X faster, and use 1.4 watts versus ~5 watts on CPU TensorFlow supports Qualcomm’s Hexagon DSP Qualcomm Snapdragon 820 Processor featuring the Hexagon DSP DragonBoard 820c

5. Copyright © 2017 Google 5 • Started with my Embedded Vision Alliance talk last year • “Eight bits are enough” • Became clear from conversations with Qualcomm that there were possibilities with their existing hardware in the Snapdragon 820 How did this happen?

6. Copyright © 2017 Google 6 • Qualcomm implemented a quick sanity test using gemmlowp, our open source math library • That demonstrated 100 GOPs/second on realistic workloads using the HVX • More than 5x speed of CPU • Power usage expected to be much lower Next Steps

8. Copyright © 2017 Google 8 • Gemmlowp results indicated around 200 GOPs/s (versus 25 GOPs/s on CPU) • End to end turned out to be around 90 ms, versus 700 ms on CPU • Was a good predictor of performance Results 8

17. Copyright © 2017 Google 17 • Examples • ARM’s Compute Library • Movidius’s mvTensor tool • CEVA’s conversion tools • Intel’s contributions to https://github.com/google/gemmlowp • Qualcomm’s HVX collaboration We work closely with chip builders

20. Copyright © 2017 Google 20 • Full support for eight bit • Full stack: researchers, data centers, mobile apps, embedded devices • Main framework at Google • Shipping for vision on many apps, including PhotoScan and Snapchat What’s TensorFlow particularly good at?

21. Copyright © 2017 Google 21 • Support for eight-bit training • On-device training (already being used by Google Keyboard) • Better export pipeline (Graph Transform Tool) • Raspberry Pi • Jetson TX1 experimental support • Other chips? • Many more examples Embedded TensorFlow Roadmap

22. Copyright © 2017 Google 22 • ARM and Intel added code to https://github.com/google/gemmlowp • Worked with many others to support TensorFlow file format for conversion pipelines • We’re always open to conversations about our requirements and porting Collaborations with hardware vendors

23. Copyright © 2017 Google 23 • TensorFlow hands-on training class from the Embedded Vision Alliance, July 13 in Santa Clara • We’re always looking for chips, tools, systems companies to collaborate with • Please get in touch! • petewarden@google.com Future

"Implementing the TensorFlow Deep Learning Framework on Qualcomm’s Low-power DSP," a Presentation from Google

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to "Implementing the TensorFlow Deep Learning Framework on Qualcomm’s Low-power DSP," a Presentation from Google

Similar to "Implementing the TensorFlow Deep Learning Framework on Qualcomm’s Low-power DSP," a Presentation from Google (20)

More from Edge AI and Vision Alliance

More from Edge AI and Vision Alliance (20)

Recently uploaded

Recently uploaded (20)

"Implementing the TensorFlow Deep Learning Framework on Qualcomm’s Low-power DSP," a Presentation from Google