SlideShare a Scribd company logo
1 of 17
Download to read offline
Copyright © 2017 Wave Computing 1
Dr. Chris Nicol, CTO
May 2017
New Dataflow Architecture for Machine Learning
Copyright © 2017 Wave Computing 2
• Founded in 2010
• Tallwood Venture Capital
• Southern Cross Venture Partners
• Headquartered in Campbell, CA
• World class team of 45 dataflow, data science, and systems experts
• 60+ patents
• Invented Dataflow Processing Unit (DPU) architecture to accelerate deep learning training by up to 1000x
• Coarse Grain Reconfigurable Array (CGRA) Architecture
• Static scheduling of data flow graphs onto massive array of processors
• Now accepting qualified customers for Early Access Program
Wave Computing Profile
Copyright © 2017 Wave Computing 3
Wave Computing – Market Focus
Consumer Smart Memory
Wave’s initial market:
Machine Learning in the Datacenter
Wave’s
Dataflow Computing
Technology
Industrial
Copyright © 2017 Wave Computing 4
Wave’s Dataflow Computer for ML Training
2.9 Peta-Ops/Second
256,000 Processing Elements
Over 2 TB Bulk & High Speed Memory
Up to 32 TB SSD Storage
Over 4.5 TB/Sec Dataflow Bandwidth
Up to 4 Wave Computers per Data Center Node
Initially Supporting TensorFlow
Copyright © 2017 Wave Computing 5
Inception v3 DNN Model
https://github.com/tensorflow/models/tree/master/inception
icp1 icp2a icp3b icp4a
input
icp5b icp6c icp7d icp8e icp9
icp10a icp11b
final
Training on Wave Deep Learning Computer
(Wave uses 16-b fixed point with stochastic rounding)
Copyright © 2017 Wave Computing 6
Training Performance: Inception V3
Layers Configuration Weights (M) GMAC DPU Arith.
Units Used
input 0.213 4.64 15,872
icp1 35x35x256 0.255 0.94 3,392
Icp2a 35x35x288 0.276 1.02 3,712
Icp3b 35x35x288 0.284 1.04 3,712
Icp4a 17x17x768 1.152 1.21 4,160
Icp5b 17x17x768 1.294 1.12 4,096
Icp6c 17x17x768 1.688 1.46 5,248
Icp7d 17x17x768 1.688 1.46 5,248
Icp8e 17x17x768 2.138 1.85 6,400
Icp9 17x17x1280 1.696 0.87 3,072
Icp10a 8x8x2048 5.038 0.97 3,584
Icp11b 8x8x2048 6.070 1.17 4,288
final 2.048 0.06 64
Total 23.8 17.7 62,848
Training Processing per 299x299x3 Image Implementation Stat Value
Weights / Image 47.6 MB
Memory Bandwidth 304 GB/sec
TMAC /s (Peak) 53 (@6.67 GHz)
TMAC/s (utilization) 37.67
Utilization of DPU Arithmetic 71%
Images Total 1.28 Million
Training Epochs 90
Training Time 15 Hours
Direct
execution
of dataflow
graph
Load
balancing
for high
throughput
High
Utilization
of MAC
units
https://github.com/tensorflow/models/tree/master/inception
Copyright © 2017 Wave Computing 7
Current Technology Limits DL Performance
Existing CPUs & GPUs are not designed
for dataflow computations
Deep Learning
Dataflow Graph
From http://download.tensorflow.org/paper/whitepaper2015.pdf
Typical compute model
OpenCL and CUDA
• Convert dataflow graph to
sequential threads on host
• CUDA/OpenCL kernels for
acceleration
CPU + GPU/FPGA
* MPI = Message Pass Interface
1. Deep learning is a dataflow application
2. We execute dataflow applications on dataflow processors
Copyright © 2017 Wave Computing 8
Wave Dataflow Processor for Deep Learning
Times
Times
I/O
Softmax
Plus
Plus
Mem I/OSigmoid
Times
Times
Plus
Plus
Softmax
Sigmoid
Programmed on
Deep Learning
Software
Run on Wave
Dataflow Processor
WaveFlow Agent Library
Deep Learning
Networks are
Dataflow
Graphs
Wave Dataflow
Processor
Copyright © 2017 Wave Computing 9
ASIC
Wave Computing
GPU
FPGA DSP
CPU
MPPA
ASSP
Coarse Grain Reconfigurable Array
• Good power efficiency from coarse-grain
statically compiled architecture
• Does not require host with runtime API
(like CUDA or OpenCL)
• Similar flexibility to MPPA, GPU and CPU
• Programmable in HLL
• Dynamic reconfiguration for mix of
applications
• Training and Inference
• ML and non-ML
CGRA Competitive Positioning
PowerEfficiency
(TOPS/Watt) Flexibility
CGRA
Copyright © 2017 Wave Computing 10
Wave DPU Architecture
24 Compute Machines
Cluster (16 PE’s and 8 Arithmetic Units)
Copyright © 2017 Wave Computing 11
Wave DPU Chip
16 nm CMOS Process Node 16 K Processors,
8192 DPU Arithmetic Units
6.7 GHz (typ)
181 Peak Tera-Ops, 8.6 Tera
Bytes/sec Bisection Bandwidth
16 MB Distributed
Data Memory
8 MB Distributed
Instruction Memory
1.71 TB/s I/O Bandwidth
through
4096 Programmable FIFOs
270 GB/s Peak
Memory Bandwidth
2048 outstanding
memory requests
PCIe Gen3-16 Host Interface 4 HMC Interfaces 2 DDR4 2400 Interfaces
Hardware Engine for Fast
Loading of Encrypted
Programs
Up to 32 Programmable
dynamic reconfiguration
zones
Auto-Calibrated Clocking /w
no global signals
Chip Characteristics & Design Features
• Clock-less CGRA is robust to Process, Voltage & Temperature
• Distributed memory architecture for parallel processing
• Optimized for data flow graph execution
• DMA-driven architecture – overlapping I/O and computation
Copyright © 2017 Wave Computing 12
Wave Current Generation DPU Board
Copyright © 2017 Wave Computing 13
Wave’s Dataflow Solution is Plug & Play
Data scientist
TensorFlow Client
Plug & Play
• No change in customer workflow
• Training and Inferencing
• TensorFlow works MUCH faster
TensorFlow subgraphs are executed
on Wave DPU processors in the
Wave Deep Learning Computers
Existing ML
Nodes
…flexible…
Wave Session Host Wave Session Host
Data Center
(Public Cloud or Enterprise)
Client Host
Wave requires NO change to existing Data Center
Data Center connection
Copyright © 2017 Wave Computing 14
Running ML on WaveFlow SW Stack
Wave ML Framework Interface
Proto
Buf
TensorFlow
Frontend
Proto
Buf
Caffe
Frontend
Proto
Buf
MXNet
Frontend
WaveFlow Session Manager
WaveFlow Execution Engine WaveFlow
Agent Library
Client Host Wave Session Host
Data Center
(Public Cloud or Enterprise)
Wave Run-Time Software
Copyright © 2017 Wave Computing 15
• WaveFlow agents are pre compiled off-line using WaveFlow SDK
• Wave provides a complete agent library for TensorFlow
• Customer can create additional agents for differentiation
WaveFlow Agent Library
Customer-supplied
agent source code
Wave-supplied
agent source code
WaveFlow
Agent
Library
WaveFlow SDK
• WFG Compiler
• WFG Linker
• WFG Simulator
• WFG Debugger
• GEMM
• ReLU
• Softmax, etc.
Copyright © 2017 Wave Computing 16
TensorFlow example
# Copyright 2015 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
"""A very simple MNIST classifier.
See extensive documentation at
http://tensorflow.org/tutorials/mnist/beginners/index.md
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
# Import data
from tensorflow.examples.tutorials.mnist import input_data
import tensorflow as tf
flags = tf.app.flags
FLAGS = flags.FLAGS
flags.DEFINE_string('data_dir', '/tmp/data/', 'Directory for storing data')
mnist = input_data.read_data_sets(FLAGS.data_dir, one_hot=True)
sess = tf.InteractiveSession()
N = 100
# Create the model
x = tf.placeholder(tf.float32, [N, 784], name="W_x")
W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))
y = tf.nn.softmax(tf.matmul(x, W) + b, name="W_y")
# Define loss and optimizer
y_ = tf.placeholder(tf.float32, [N,10], name="W_y_")
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]),name="W_entropy")
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32),name="W_acc")
tf.initialize_all_variables().run()
tf.train.write_graph(sess.graph_def, ".", "mtrain.pb", True );
for i in range(101):
batch_xs, batch_ys = mnist.train.next_batch(N)
train_step.run({x: batch_xs, y_: batch_ys})
print( accuracy.eval({x: batch_xs, y_: batch_ys}))
print( cross_entropy.eval({x: batch_xs, y_: batch_ys}))
print( sess.run(b) )
Session Manager
Dataflow graph is
partitioned into agents and
dispatched to dataflow
computer for execution
Graph is generated at runtime from
graph provided by Tensorflow.
Copyright © 2017 Wave Computing 17
• Wave is now accepting qualified customers to its Early Access Program (EAP)
• Provides select companies access to a Wave machine learning computer for
testing and benchmarking months before official system sales begin
• For details about participation in the limited number of EAP positions,
contact info@wavecomp.com
Wave’s Early Access Program

More Related Content

What's hot

Improving Hardware Efficiency for DNN Applications
Improving Hardware Efficiency for DNN ApplicationsImproving Hardware Efficiency for DNN Applications
Improving Hardware Efficiency for DNN ApplicationsChester Chen
 
Deep Learning Frameworks Using Spark on YARN by Vartika Singh
Deep Learning Frameworks Using Spark on YARN by Vartika SinghDeep Learning Frameworks Using Spark on YARN by Vartika Singh
Deep Learning Frameworks Using Spark on YARN by Vartika SinghData Con LA
 
“Introduction to the TVM Open Source Deep Learning Compiler Stack,” a Present...
“Introduction to the TVM Open Source Deep Learning Compiler Stack,” a Present...“Introduction to the TVM Open Source Deep Learning Compiler Stack,” a Present...
“Introduction to the TVM Open Source Deep Learning Compiler Stack,” a Present...Edge AI and Vision Alliance
 
Deep Learning Applications (dadada2017)
Deep Learning Applications (dadada2017)Deep Learning Applications (dadada2017)
Deep Learning Applications (dadada2017)Abhishek Thakur
 
"Trade-offs in Implementing Deep Neural Networks on FPGAs," a Presentation fr...
"Trade-offs in Implementing Deep Neural Networks on FPGAs," a Presentation fr..."Trade-offs in Implementing Deep Neural Networks on FPGAs," a Presentation fr...
"Trade-offs in Implementing Deep Neural Networks on FPGAs," a Presentation fr...Edge AI and Vision Alliance
 
Recent developments in Deep Learning
Recent developments in Deep LearningRecent developments in Deep Learning
Recent developments in Deep LearningBrahim HAMADICHAREF
 
Metta Innovations - Introdução ao Deep Learning aplicado a vídeo analytics
Metta Innovations - Introdução ao Deep Learning aplicado a vídeo analyticsMetta Innovations - Introdução ao Deep Learning aplicado a vídeo analytics
Metta Innovations - Introdução ao Deep Learning aplicado a vídeo analyticsEduardo Gaspar
 
"Accelerating Deep Learning Using Altera FPGAs," a Presentation from Intel
"Accelerating Deep Learning Using Altera FPGAs," a Presentation from Intel"Accelerating Deep Learning Using Altera FPGAs," a Presentation from Intel
"Accelerating Deep Learning Using Altera FPGAs," a Presentation from IntelEdge AI and Vision Alliance
 
Mastering Computer Vision Problems with State-of-the-art Deep Learning
Mastering Computer Vision Problems with State-of-the-art Deep LearningMastering Computer Vision Problems with State-of-the-art Deep Learning
Mastering Computer Vision Problems with State-of-the-art Deep LearningMiguel González-Fierro
 
NVIDIA深度學習教育機構 (DLI): Object detection with jetson
NVIDIA深度學習教育機構 (DLI): Object detection with jetsonNVIDIA深度學習教育機構 (DLI): Object detection with jetson
NVIDIA深度學習教育機構 (DLI): Object detection with jetsonNVIDIA Taiwan
 
Deep Learning with Microsoft R Open
Deep Learning with Microsoft R OpenDeep Learning with Microsoft R Open
Deep Learning with Microsoft R OpenPoo Kuan Hoong
 
"Approaches for Vision-based Driver Monitoring," a Presentation from PathPart...
"Approaches for Vision-based Driver Monitoring," a Presentation from PathPart..."Approaches for Vision-based Driver Monitoring," a Presentation from PathPart...
"Approaches for Vision-based Driver Monitoring," a Presentation from PathPart...Edge AI and Vision Alliance
 
A Platform for Accelerating Machine Learning Applications
 A Platform for Accelerating Machine Learning Applications A Platform for Accelerating Machine Learning Applications
A Platform for Accelerating Machine Learning ApplicationsNVIDIA Taiwan
 
TECHNICAL OVERVIEW NVIDIA DEEP LEARNING PLATFORM Giant Leaps in Performance ...
TECHNICAL OVERVIEW NVIDIA DEEP  LEARNING PLATFORM Giant Leaps in Performance ...TECHNICAL OVERVIEW NVIDIA DEEP  LEARNING PLATFORM Giant Leaps in Performance ...
TECHNICAL OVERVIEW NVIDIA DEEP LEARNING PLATFORM Giant Leaps in Performance ...Willy Marroquin (WillyDevNET)
 
Introduction to multi gpu deep learning with DIGITS 2 - Mike Wang
Introduction to multi gpu deep learning with DIGITS 2 - Mike WangIntroduction to multi gpu deep learning with DIGITS 2 - Mike Wang
Introduction to multi gpu deep learning with DIGITS 2 - Mike WangPAPIs.io
 
Introduction to Deep Learning and neon at Galvanize
Introduction to Deep Learning and neon at GalvanizeIntroduction to Deep Learning and neon at Galvanize
Introduction to Deep Learning and neon at GalvanizeIntel Nervana
 
"Implementing the TensorFlow Deep Learning Framework on Qualcomm’s Low-power ...
"Implementing the TensorFlow Deep Learning Framework on Qualcomm’s Low-power ..."Implementing the TensorFlow Deep Learning Framework on Qualcomm’s Low-power ...
"Implementing the TensorFlow Deep Learning Framework on Qualcomm’s Low-power ...Edge AI and Vision Alliance
 
Nervana and the Future of Computing
Nervana and the Future of ComputingNervana and the Future of Computing
Nervana and the Future of ComputingIntel Nervana
 

What's hot (20)

Improving Hardware Efficiency for DNN Applications
Improving Hardware Efficiency for DNN ApplicationsImproving Hardware Efficiency for DNN Applications
Improving Hardware Efficiency for DNN Applications
 
Deep Learning Frameworks Using Spark on YARN by Vartika Singh
Deep Learning Frameworks Using Spark on YARN by Vartika SinghDeep Learning Frameworks Using Spark on YARN by Vartika Singh
Deep Learning Frameworks Using Spark on YARN by Vartika Singh
 
“Introduction to the TVM Open Source Deep Learning Compiler Stack,” a Present...
“Introduction to the TVM Open Source Deep Learning Compiler Stack,” a Present...“Introduction to the TVM Open Source Deep Learning Compiler Stack,” a Present...
“Introduction to the TVM Open Source Deep Learning Compiler Stack,” a Present...
 
Deep Learning Applications (dadada2017)
Deep Learning Applications (dadada2017)Deep Learning Applications (dadada2017)
Deep Learning Applications (dadada2017)
 
"Trade-offs in Implementing Deep Neural Networks on FPGAs," a Presentation fr...
"Trade-offs in Implementing Deep Neural Networks on FPGAs," a Presentation fr..."Trade-offs in Implementing Deep Neural Networks on FPGAs," a Presentation fr...
"Trade-offs in Implementing Deep Neural Networks on FPGAs," a Presentation fr...
 
Recent developments in Deep Learning
Recent developments in Deep LearningRecent developments in Deep Learning
Recent developments in Deep Learning
 
Metta Innovations - Introdução ao Deep Learning aplicado a vídeo analytics
Metta Innovations - Introdução ao Deep Learning aplicado a vídeo analyticsMetta Innovations - Introdução ao Deep Learning aplicado a vídeo analytics
Metta Innovations - Introdução ao Deep Learning aplicado a vídeo analytics
 
"Accelerating Deep Learning Using Altera FPGAs," a Presentation from Intel
"Accelerating Deep Learning Using Altera FPGAs," a Presentation from Intel"Accelerating Deep Learning Using Altera FPGAs," a Presentation from Intel
"Accelerating Deep Learning Using Altera FPGAs," a Presentation from Intel
 
Mastering Computer Vision Problems with State-of-the-art Deep Learning
Mastering Computer Vision Problems with State-of-the-art Deep LearningMastering Computer Vision Problems with State-of-the-art Deep Learning
Mastering Computer Vision Problems with State-of-the-art Deep Learning
 
CNN Quantization
CNN QuantizationCNN Quantization
CNN Quantization
 
NVIDIA深度學習教育機構 (DLI): Object detection with jetson
NVIDIA深度學習教育機構 (DLI): Object detection with jetsonNVIDIA深度學習教育機構 (DLI): Object detection with jetson
NVIDIA深度學習教育機構 (DLI): Object detection with jetson
 
Deep Learning with Microsoft R Open
Deep Learning with Microsoft R OpenDeep Learning with Microsoft R Open
Deep Learning with Microsoft R Open
 
"Approaches for Vision-based Driver Monitoring," a Presentation from PathPart...
"Approaches for Vision-based Driver Monitoring," a Presentation from PathPart..."Approaches for Vision-based Driver Monitoring," a Presentation from PathPart...
"Approaches for Vision-based Driver Monitoring," a Presentation from PathPart...
 
Android and Deep Learning
Android and Deep LearningAndroid and Deep Learning
Android and Deep Learning
 
A Platform for Accelerating Machine Learning Applications
 A Platform for Accelerating Machine Learning Applications A Platform for Accelerating Machine Learning Applications
A Platform for Accelerating Machine Learning Applications
 
TECHNICAL OVERVIEW NVIDIA DEEP LEARNING PLATFORM Giant Leaps in Performance ...
TECHNICAL OVERVIEW NVIDIA DEEP  LEARNING PLATFORM Giant Leaps in Performance ...TECHNICAL OVERVIEW NVIDIA DEEP  LEARNING PLATFORM Giant Leaps in Performance ...
TECHNICAL OVERVIEW NVIDIA DEEP LEARNING PLATFORM Giant Leaps in Performance ...
 
Introduction to multi gpu deep learning with DIGITS 2 - Mike Wang
Introduction to multi gpu deep learning with DIGITS 2 - Mike WangIntroduction to multi gpu deep learning with DIGITS 2 - Mike Wang
Introduction to multi gpu deep learning with DIGITS 2 - Mike Wang
 
Introduction to Deep Learning and neon at Galvanize
Introduction to Deep Learning and neon at GalvanizeIntroduction to Deep Learning and neon at Galvanize
Introduction to Deep Learning and neon at Galvanize
 
"Implementing the TensorFlow Deep Learning Framework on Qualcomm’s Low-power ...
"Implementing the TensorFlow Deep Learning Framework on Qualcomm’s Low-power ..."Implementing the TensorFlow Deep Learning Framework on Qualcomm’s Low-power ...
"Implementing the TensorFlow Deep Learning Framework on Qualcomm’s Low-power ...
 
Nervana and the Future of Computing
Nervana and the Future of ComputingNervana and the Future of Computing
Nervana and the Future of Computing
 

Similar to "New Dataflow Architecture for Machine Learning," a Presentation from Wave Computing

Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)Lablup Inc.
 
A Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural NetworksA Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural Networksinside-BigData.com
 
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors DataWorks Summit/Hadoop Summit
 
Optimizing Hortonworks Apache Spark machine learning workloads for contempora...
Optimizing Hortonworks Apache Spark machine learning workloads for contempora...Optimizing Hortonworks Apache Spark machine learning workloads for contempora...
Optimizing Hortonworks Apache Spark machine learning workloads for contempora...Indrajit Poddar
 
OS for AI: Elastic Microservices & the Next Gen of ML
OS for AI: Elastic Microservices & the Next Gen of MLOS for AI: Elastic Microservices & the Next Gen of ML
OS for AI: Elastic Microservices & the Next Gen of MLNordic APIs
 
Innovation with ai at scale on the edge vt sept 2019 v0
Innovation with ai at scale  on the edge vt sept 2019 v0Innovation with ai at scale  on the edge vt sept 2019 v0
Innovation with ai at scale on the edge vt sept 2019 v0Ganesan Narayanasamy
 
Big Data Taiwan 2014 Track2-2: Informatica Big Data Solution
Big Data Taiwan 2014 Track2-2: Informatica Big Data SolutionBig Data Taiwan 2014 Track2-2: Informatica Big Data Solution
Big Data Taiwan 2014 Track2-2: Informatica Big Data SolutionEtu Solution
 
Unconference Round Table Notes
Unconference Round Table NotesUnconference Round Table Notes
Unconference Round Table NotesTimothy Spann
 
Webinar: Cutting Time, Complexity and Cost from Data Science to Production
Webinar: Cutting Time, Complexity and Cost from Data Science to ProductionWebinar: Cutting Time, Complexity and Cost from Data Science to Production
Webinar: Cutting Time, Complexity and Cost from Data Science to Productioniguazio
 
The Convergence of HPC and Deep Learning
The Convergence of HPC and Deep LearningThe Convergence of HPC and Deep Learning
The Convergence of HPC and Deep Learninginside-BigData.com
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...Databricks
 
High Performance Computing Pitch Deck
High Performance Computing Pitch DeckHigh Performance Computing Pitch Deck
High Performance Computing Pitch DeckNicholas Vossburg
 
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflows
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflowsCloud nativecomputingtechnologysupportinghpc cognitiveworkflows
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflowsYong Feng
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...Databricks
 
Scalable TensorFlow Deep Learning as a Service with Docker, OpenPOWER, and GPUs
Scalable TensorFlow Deep Learning as a Service with Docker, OpenPOWER, and GPUsScalable TensorFlow Deep Learning as a Service with Docker, OpenPOWER, and GPUs
Scalable TensorFlow Deep Learning as a Service with Docker, OpenPOWER, and GPUsIndrajit Poddar
 
Tesla Accelerated Computing Platform
Tesla Accelerated Computing PlatformTesla Accelerated Computing Platform
Tesla Accelerated Computing Platforminside-BigData.com
 
How to Run TensorFlow Cheaper in the Cloud Using Elastic GPUs
How to Run TensorFlow Cheaper in the Cloud Using Elastic GPUsHow to Run TensorFlow Cheaper in the Cloud Using Elastic GPUs
How to Run TensorFlow Cheaper in the Cloud Using Elastic GPUsAltoros
 
OCP U.S. Summit 2017 Presentation
OCP U.S. Summit 2017 PresentationOCP U.S. Summit 2017 Presentation
OCP U.S. Summit 2017 PresentationNetronome
 
Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21JDA Labs MTL
 
DSDT Meetup Nov 2017
DSDT Meetup Nov 2017DSDT Meetup Nov 2017
DSDT Meetup Nov 2017DSDT_MTL
 

Similar to "New Dataflow Architecture for Machine Learning," a Presentation from Wave Computing (20)

Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
 
A Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural NetworksA Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural Networks
 
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
How to Optimize Hortonworks Apache Spark ML Workloads on Modern Processors
 
Optimizing Hortonworks Apache Spark machine learning workloads for contempora...
Optimizing Hortonworks Apache Spark machine learning workloads for contempora...Optimizing Hortonworks Apache Spark machine learning workloads for contempora...
Optimizing Hortonworks Apache Spark machine learning workloads for contempora...
 
OS for AI: Elastic Microservices & the Next Gen of ML
OS for AI: Elastic Microservices & the Next Gen of MLOS for AI: Elastic Microservices & the Next Gen of ML
OS for AI: Elastic Microservices & the Next Gen of ML
 
Innovation with ai at scale on the edge vt sept 2019 v0
Innovation with ai at scale  on the edge vt sept 2019 v0Innovation with ai at scale  on the edge vt sept 2019 v0
Innovation with ai at scale on the edge vt sept 2019 v0
 
Big Data Taiwan 2014 Track2-2: Informatica Big Data Solution
Big Data Taiwan 2014 Track2-2: Informatica Big Data SolutionBig Data Taiwan 2014 Track2-2: Informatica Big Data Solution
Big Data Taiwan 2014 Track2-2: Informatica Big Data Solution
 
Unconference Round Table Notes
Unconference Round Table NotesUnconference Round Table Notes
Unconference Round Table Notes
 
Webinar: Cutting Time, Complexity and Cost from Data Science to Production
Webinar: Cutting Time, Complexity and Cost from Data Science to ProductionWebinar: Cutting Time, Complexity and Cost from Data Science to Production
Webinar: Cutting Time, Complexity and Cost from Data Science to Production
 
The Convergence of HPC and Deep Learning
The Convergence of HPC and Deep LearningThe Convergence of HPC and Deep Learning
The Convergence of HPC and Deep Learning
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
 
High Performance Computing Pitch Deck
High Performance Computing Pitch DeckHigh Performance Computing Pitch Deck
High Performance Computing Pitch Deck
 
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflows
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflowsCloud nativecomputingtechnologysupportinghpc cognitiveworkflows
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflows
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
 
Scalable TensorFlow Deep Learning as a Service with Docker, OpenPOWER, and GPUs
Scalable TensorFlow Deep Learning as a Service with Docker, OpenPOWER, and GPUsScalable TensorFlow Deep Learning as a Service with Docker, OpenPOWER, and GPUs
Scalable TensorFlow Deep Learning as a Service with Docker, OpenPOWER, and GPUs
 
Tesla Accelerated Computing Platform
Tesla Accelerated Computing PlatformTesla Accelerated Computing Platform
Tesla Accelerated Computing Platform
 
How to Run TensorFlow Cheaper in the Cloud Using Elastic GPUs
How to Run TensorFlow Cheaper in the Cloud Using Elastic GPUsHow to Run TensorFlow Cheaper in the Cloud Using Elastic GPUs
How to Run TensorFlow Cheaper in the Cloud Using Elastic GPUs
 
OCP U.S. Summit 2017 Presentation
OCP U.S. Summit 2017 PresentationOCP U.S. Summit 2017 Presentation
OCP U.S. Summit 2017 Presentation
 
Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21Dsdt meetup 2017 11-21
Dsdt meetup 2017 11-21
 
DSDT Meetup Nov 2017
DSDT Meetup Nov 2017DSDT Meetup Nov 2017
DSDT Meetup Nov 2017
 

More from Edge AI and Vision Alliance

“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...Edge AI and Vision Alliance
 
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...Edge AI and Vision Alliance
 
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...Edge AI and Vision Alliance
 
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...Edge AI and Vision Alliance
 
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...Edge AI and Vision Alliance
 
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...Edge AI and Vision Alliance
 
“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...Edge AI and Vision Alliance
 
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsightsEdge AI and Vision Alliance
 
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...Edge AI and Vision Alliance
 
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...Edge AI and Vision Alliance
 
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...Edge AI and Vision Alliance
 
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...Edge AI and Vision Alliance
 
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...Edge AI and Vision Alliance
 
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...Edge AI and Vision Alliance
 
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...Edge AI and Vision Alliance
 
“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from SamsaraEdge AI and Vision Alliance
 
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...Edge AI and Vision Alliance
 
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...Edge AI and Vision Alliance
 
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...Edge AI and Vision Alliance
 
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...Edge AI and Vision Alliance
 

More from Edge AI and Vision Alliance (20)

“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
 
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
“Introduction to Computer Vision with CNNs,” a Presentation from Mohammad Hag...
 
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
“Selecting Tools for Developing, Monitoring and Maintaining ML Models,” a Pre...
 
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
“Building Accelerated GStreamer Applications for Video and Audio AI,” a Prese...
 
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
 
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
“Introduction to Modern LiDAR for Machine Perception,” a Presentation from th...
 
“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...“Vision-language Representations for Robotics,” a Presentation from the Unive...
“Vision-language Representations for Robotics,” a Presentation from the Unive...
 
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
“ADAS and AV Sensors: What’s Winning and Why?,” a Presentation from TechInsights
 
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
 
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
 
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
 
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
 
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
 
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
 
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
 
“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara
 
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
 
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
 
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
 
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
 

Recently uploaded

08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 

Recently uploaded (20)

08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 

"New Dataflow Architecture for Machine Learning," a Presentation from Wave Computing

  • 1. Copyright © 2017 Wave Computing 1 Dr. Chris Nicol, CTO May 2017 New Dataflow Architecture for Machine Learning
  • 2. Copyright © 2017 Wave Computing 2 • Founded in 2010 • Tallwood Venture Capital • Southern Cross Venture Partners • Headquartered in Campbell, CA • World class team of 45 dataflow, data science, and systems experts • 60+ patents • Invented Dataflow Processing Unit (DPU) architecture to accelerate deep learning training by up to 1000x • Coarse Grain Reconfigurable Array (CGRA) Architecture • Static scheduling of data flow graphs onto massive array of processors • Now accepting qualified customers for Early Access Program Wave Computing Profile
  • 3. Copyright © 2017 Wave Computing 3 Wave Computing – Market Focus Consumer Smart Memory Wave’s initial market: Machine Learning in the Datacenter Wave’s Dataflow Computing Technology Industrial
  • 4. Copyright © 2017 Wave Computing 4 Wave’s Dataflow Computer for ML Training 2.9 Peta-Ops/Second 256,000 Processing Elements Over 2 TB Bulk & High Speed Memory Up to 32 TB SSD Storage Over 4.5 TB/Sec Dataflow Bandwidth Up to 4 Wave Computers per Data Center Node Initially Supporting TensorFlow
  • 5. Copyright © 2017 Wave Computing 5 Inception v3 DNN Model https://github.com/tensorflow/models/tree/master/inception icp1 icp2a icp3b icp4a input icp5b icp6c icp7d icp8e icp9 icp10a icp11b final Training on Wave Deep Learning Computer (Wave uses 16-b fixed point with stochastic rounding)
  • 6. Copyright © 2017 Wave Computing 6 Training Performance: Inception V3 Layers Configuration Weights (M) GMAC DPU Arith. Units Used input 0.213 4.64 15,872 icp1 35x35x256 0.255 0.94 3,392 Icp2a 35x35x288 0.276 1.02 3,712 Icp3b 35x35x288 0.284 1.04 3,712 Icp4a 17x17x768 1.152 1.21 4,160 Icp5b 17x17x768 1.294 1.12 4,096 Icp6c 17x17x768 1.688 1.46 5,248 Icp7d 17x17x768 1.688 1.46 5,248 Icp8e 17x17x768 2.138 1.85 6,400 Icp9 17x17x1280 1.696 0.87 3,072 Icp10a 8x8x2048 5.038 0.97 3,584 Icp11b 8x8x2048 6.070 1.17 4,288 final 2.048 0.06 64 Total 23.8 17.7 62,848 Training Processing per 299x299x3 Image Implementation Stat Value Weights / Image 47.6 MB Memory Bandwidth 304 GB/sec TMAC /s (Peak) 53 (@6.67 GHz) TMAC/s (utilization) 37.67 Utilization of DPU Arithmetic 71% Images Total 1.28 Million Training Epochs 90 Training Time 15 Hours Direct execution of dataflow graph Load balancing for high throughput High Utilization of MAC units https://github.com/tensorflow/models/tree/master/inception
  • 7. Copyright © 2017 Wave Computing 7 Current Technology Limits DL Performance Existing CPUs & GPUs are not designed for dataflow computations Deep Learning Dataflow Graph From http://download.tensorflow.org/paper/whitepaper2015.pdf Typical compute model OpenCL and CUDA • Convert dataflow graph to sequential threads on host • CUDA/OpenCL kernels for acceleration CPU + GPU/FPGA * MPI = Message Pass Interface 1. Deep learning is a dataflow application 2. We execute dataflow applications on dataflow processors
  • 8. Copyright © 2017 Wave Computing 8 Wave Dataflow Processor for Deep Learning Times Times I/O Softmax Plus Plus Mem I/OSigmoid Times Times Plus Plus Softmax Sigmoid Programmed on Deep Learning Software Run on Wave Dataflow Processor WaveFlow Agent Library Deep Learning Networks are Dataflow Graphs Wave Dataflow Processor
  • 9. Copyright © 2017 Wave Computing 9 ASIC Wave Computing GPU FPGA DSP CPU MPPA ASSP Coarse Grain Reconfigurable Array • Good power efficiency from coarse-grain statically compiled architecture • Does not require host with runtime API (like CUDA or OpenCL) • Similar flexibility to MPPA, GPU and CPU • Programmable in HLL • Dynamic reconfiguration for mix of applications • Training and Inference • ML and non-ML CGRA Competitive Positioning PowerEfficiency (TOPS/Watt) Flexibility CGRA
  • 10. Copyright © 2017 Wave Computing 10 Wave DPU Architecture 24 Compute Machines Cluster (16 PE’s and 8 Arithmetic Units)
  • 11. Copyright © 2017 Wave Computing 11 Wave DPU Chip 16 nm CMOS Process Node 16 K Processors, 8192 DPU Arithmetic Units 6.7 GHz (typ) 181 Peak Tera-Ops, 8.6 Tera Bytes/sec Bisection Bandwidth 16 MB Distributed Data Memory 8 MB Distributed Instruction Memory 1.71 TB/s I/O Bandwidth through 4096 Programmable FIFOs 270 GB/s Peak Memory Bandwidth 2048 outstanding memory requests PCIe Gen3-16 Host Interface 4 HMC Interfaces 2 DDR4 2400 Interfaces Hardware Engine for Fast Loading of Encrypted Programs Up to 32 Programmable dynamic reconfiguration zones Auto-Calibrated Clocking /w no global signals Chip Characteristics & Design Features • Clock-less CGRA is robust to Process, Voltage & Temperature • Distributed memory architecture for parallel processing • Optimized for data flow graph execution • DMA-driven architecture – overlapping I/O and computation
  • 12. Copyright © 2017 Wave Computing 12 Wave Current Generation DPU Board
  • 13. Copyright © 2017 Wave Computing 13 Wave’s Dataflow Solution is Plug & Play Data scientist TensorFlow Client Plug & Play • No change in customer workflow • Training and Inferencing • TensorFlow works MUCH faster TensorFlow subgraphs are executed on Wave DPU processors in the Wave Deep Learning Computers Existing ML Nodes …flexible… Wave Session Host Wave Session Host Data Center (Public Cloud or Enterprise) Client Host Wave requires NO change to existing Data Center Data Center connection
  • 14. Copyright © 2017 Wave Computing 14 Running ML on WaveFlow SW Stack Wave ML Framework Interface Proto Buf TensorFlow Frontend Proto Buf Caffe Frontend Proto Buf MXNet Frontend WaveFlow Session Manager WaveFlow Execution Engine WaveFlow Agent Library Client Host Wave Session Host Data Center (Public Cloud or Enterprise) Wave Run-Time Software
  • 15. Copyright © 2017 Wave Computing 15 • WaveFlow agents are pre compiled off-line using WaveFlow SDK • Wave provides a complete agent library for TensorFlow • Customer can create additional agents for differentiation WaveFlow Agent Library Customer-supplied agent source code Wave-supplied agent source code WaveFlow Agent Library WaveFlow SDK • WFG Compiler • WFG Linker • WFG Simulator • WFG Debugger • GEMM • ReLU • Softmax, etc.
  • 16. Copyright © 2017 Wave Computing 16 TensorFlow example # Copyright 2015 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # ============================================================================== """A very simple MNIST classifier. See extensive documentation at http://tensorflow.org/tutorials/mnist/beginners/index.md """ from __future__ import absolute_import from __future__ import division from __future__ import print_function # Import data from tensorflow.examples.tutorials.mnist import input_data import tensorflow as tf flags = tf.app.flags FLAGS = flags.FLAGS flags.DEFINE_string('data_dir', '/tmp/data/', 'Directory for storing data') mnist = input_data.read_data_sets(FLAGS.data_dir, one_hot=True) sess = tf.InteractiveSession() N = 100 # Create the model x = tf.placeholder(tf.float32, [N, 784], name="W_x") W = tf.Variable(tf.zeros([784, 10])) b = tf.Variable(tf.zeros([10])) y = tf.nn.softmax(tf.matmul(x, W) + b, name="W_y") # Define loss and optimizer y_ = tf.placeholder(tf.float32, [N,10], name="W_y_") cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]),name="W_entropy") train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy) correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1)) accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32),name="W_acc") tf.initialize_all_variables().run() tf.train.write_graph(sess.graph_def, ".", "mtrain.pb", True ); for i in range(101): batch_xs, batch_ys = mnist.train.next_batch(N) train_step.run({x: batch_xs, y_: batch_ys}) print( accuracy.eval({x: batch_xs, y_: batch_ys})) print( cross_entropy.eval({x: batch_xs, y_: batch_ys})) print( sess.run(b) ) Session Manager Dataflow graph is partitioned into agents and dispatched to dataflow computer for execution Graph is generated at runtime from graph provided by Tensorflow.
  • 17. Copyright © 2017 Wave Computing 17 • Wave is now accepting qualified customers to its Early Access Program (EAP) • Provides select companies access to a Wave machine learning computer for testing and benchmarking months before official system sales begin • For details about participation in the limited number of EAP positions, contact info@wavecomp.com Wave’s Early Access Program