SlideShare a Scribd company logo
AI NEXTCon |
Agenda
Applications
FrameworkOptimization
Fundamental
Feature Ask
Optimize for
Application
Make optimization
Re-usable for others
Applications
(0.78, 0.8, 0.4, 0.3, 0.9,...)
(0.75, 0.6, 0.1, 0.7, 0.2,...)
… …
• Semantic similarity
Vector Representation
Nearest Neighbor Search
in Semantic Space
Q: {is it legal for 17
year old to buy a car}
Bag of Words Inverted Index Matching
car
legal
…
OR
AND
buy
own
• L1: BM25F
Ranking
Posting 1
Posting 2
Posting 3
Posting 4
…
buy
legal
…
L2/L3/L4
ReRanking
Semantic search can help recall issues, nearly a third of relevance DSATs.
Query: {how many women voices in Switchboard telephone corpus }
Cannot recall the good urls by query
term alteration and term match
DL model captures full context, and builds semantic
meanings into vectors. The query vector and
document vector are near in vector space.
Applications
•
•
•
•
•
Applications
Query: Where's the nearest fruit smoothies
Location: Omaha, Nebraska
Applications
Framework
Deep Learning Platform
DLIS Pluggable Runtime
Linux ContainerNative Windows
Microsoft
CNTK TensorFlowWin
DeepCPU
TensorFlowLinux
Caffe
Hardware Accerlation
CPU GPUFPGA
Self-Serve
Portal
Model
Development
toolkit
Model
Repository
Theano ...
Workloads
Web Text Speech Image Enterprise
DLVS Pluggable Runtime
HNSW K-D Tree Faiss
ANN Index Build
on Multi-Tenancy
FrontDoor
DLIS DLVS
• Customizable runtime
• Privacy and Compliance Certification
• In production globally
•
•
•
1Ms QPS, 100s models, 100Bs vectors, 20+ Regions
Framework
Optimal distribution to match model requirements to server fleet
Framework
Windows Machine SKU-2
Model2
Model2
Linux Machine SKU-4
Windows Machine SKU-1
Model1
Model1
Linux Machine SKU-5
Model6
Windows Machine SKU-3
Model6
Model6
Model6
Model2
Model2
Windows Machine SKU-2
Model2
Model2
Windows Machine SKU-2
Model3
Model4
Model5
Model2
Model2
Model1
Model1
Model1
Model1
Model1
Model1
Model1
Model1
Model1
Model4
Model5
Multiple model instances
across multiple machines
Multiple model instances
share same machine
Different
Operating
System in
same bed
Different runtime
Model7
Linux Machine SKU-5
Model8Model7
Model7
CNTK
TensorFlow
Windows
DeepCPU
TensorFlow
Linux
Different
machine
SKU in
same bed
Query:
coffee in Melbourne
Semantic
Representation
Vector
Online
Inferencing
Batch
Inferencing
Document 1
Document 2
...
Vector Set
Similarity Search
Framework
Vector Recall by Nearest Neighbor Search
Search among
points in bucket
Hash query
to this bucket
NNG
HNSW
KD-tree
Semantic word 1
Semantic word 2 Semantic word 3
TP-tree
&
Wang, Jingdong, and Shipeng Li. "Query-driven iterated neighborhood graph search for large scale indexing." Proceedings of the 20th ACM international conference on Multimedia. ACM, 2012.
Framework
Optimization
Hardware +
Software
Acceleration
DeepCPU
BrainWave / FPGA
DeepGPU
RNN Serving Performance Challenges
Language Modeling
Machine Translation
Machine Reading
Comprehension
Conversation Bot
Speech
Recognition
…
Limited Parallelism
Limited Bandwidth
• Small batch size
• Sequential dependency
• Vector-matrix multiplication
• Low data reuse
14
Xt-1 Xt Xt+1
Ot-1 Ot Ot+1
St-1 St
St+1
W W W
U U U
V V V
Optimization
1. Matrix computation:
2. Activation function
3. Operation Fusing
4. Affinity
5. Locality
6. Parallelism
7. Task scheduling
Collaborating with Yuxiong He, Minjia Zhang, Samyam Rajbhandari, Wenhan Wang,
Microsoft AI and Research.
Optimization
𝑧𝑡 = 𝜎 𝑊𝑧 𝑥 𝑡 + 𝑈𝑧ℎ 𝑡−1 + 𝑏 𝑧
𝑟𝑡 = 𝜎 𝑊𝑟 𝑥 𝑡 + 𝑈𝑟ℎ 𝑡−1 + 𝑏 𝑟
ℎ 𝑡 = 𝑧𝑡 ∘ ℎ 𝑡−1 + 1 − 𝑧𝑡 ∘ tanh(𝑊ℎ 𝑥 𝑡 + 𝑈ℎ 𝑟𝑡 ∘ ℎ 𝑡−1 + 𝑏ℎ)
On a machine with 12 cores…
a) 1 core per operation, multiplications done in parallel
1 1 1 1 1
1
time
cores
6
12
b) 12 cores per operation, multiplications done sequentially
12 12 12 12 12
12
6
12
cores
time
many idle cores
unbalanced load
poor speedup of
intra-op parallelism
Optimization
Optimization
𝑧𝑡 = 𝜎 𝑊𝑧 𝑥 𝑡 + 𝑈𝑧ℎ 𝑡−1 + 𝑏 𝑧
𝑟𝑡 = 𝜎 𝑊𝑟 𝑥 𝑡 + 𝑈𝑟ℎ 𝑡−1 + 𝑏 𝑟
ℎ 𝑡 = 𝑧𝑡 ∘ ℎ 𝑡−1 + 1 − 𝑧𝑡 ∘ tanh(𝑊ℎ 𝑥 𝑡 + 𝑈ℎ 𝑟𝑡 ∘ ℎ 𝑡−1 + 𝑏ℎ)
On a machine with 12 cores…
d) an optimized configuration, reducing latency
6
12
cores
time
2 2 3 3 2
6
c) 4 cores per operation
4 4 4 4 4
4
time
cores
6
12
1 1 1 2 2
2
1 2
Bad scheduling order
✓ Workload size
✓ Parallelism efficiency
✓ Critical path
✓ Load balancing
Optimization
19
Cache-Aware Partitioning
20
Optimization
21
Optimization
DL Scenarios Original Latency
Latency
Target
Optimized Latency
Latency
reduction
Throughput
improvement
Turing Prototype 2 ~100ms 10ms 9ms >10X > 10X
Turing Prototype 3 ~107ms 10ms 4.1ms >20X > 50X
Deep Query Document
Similarity
10~12ms for [query,
1 doc] x 33 docs
6ms
1.5ms for [query, 1 doc];
<6ms for [query, 33 docs]
>6X > 30X
Malta Click Features
10ms for
[query, 1 passage]
x 150 passages
5ms
<1ms for [query, 1 passage];
<5ms for [query, 150 passages]
>10X > 100X
Ads seq2seq model for
query rewriting
51ms 5ms 4ms >10X > 3X
AGI Encoder V2 ~29ms 10ms 5.4ms 5X 5X
RNet (InfoBot + Bing)
~45ms for 1 [query,
passage]
10ms
4.0ms for 1 [query,
passage];
<8.5ms for 20 [query,
passage]
11X > 100X
Bing query tagging 9~16ms on CNTK 3ms 0.95ms 10X > 10X
WideDeepRight Model
(TP3 L1)
~25ms for [query, 1
title url]
7ms for a
batch size of
33
5.4ms for [query,
33 title url];
10X > 100X
TP3 L2 Classifier 60ms 3ms 3ms 20X 20X
TP3 L1 8ms 3ms 1ms 8X 8X
Optimization
ONNX/WinML
23
Optimization
24
Original TensorFlow model
TensorFlow model with DeepCPU operator
Optimization
F F F
L0
L1
F F F
L0
Pretrained DNN Model
in TF/CNTK/ONNX, etc.
Scalable DNN Hardware
Microservice
BrainWave
Soft DPU
Instr Decoder
& Control
Neural FU
Network switches
FPGAs
Optimization
Optimization
Production Bing DNN Model 1
CPU only Brainwave accelerated Improvement
Model Details GRU 128X200 (X2) + W2Vec LSTM 500X200 (x8) +W2Vec Brainwave accelerated mode
is > 10X larger and > 10X
lower latencyEnd-to-End latency per Batch
1 request at 95%
9ms 0.85ms
Production Bing DNN Model 2
CPU only Brainwave accelerated Improvement
Model Details 1D CNN + W2Vec (RNNs
removed)
1D CNN + W2Vec + GRU
500x500 (x4)
Brainwave accelerated mode
is > 10X larger and 3X lower
latency
End-to-End latency per Batch
1 request at 95%
15ms 5ms
Optimization
Layer GEMM
𝑊𝑖
𝑊𝑓
𝑊𝑜
𝑊𝑐
G*H
S
𝑥 𝑡S
N
G*H
N
Recurrent GEMM
𝑈𝑖
𝑈𝑓
𝑈 𝑜
𝑈𝑐
H
ℎ 𝑡−1H
G*H
N
N
G*H
S = synthetic_dim
H = hidden_dim
N = batch_size
G = num_gates
Optimization
Optimization
RF
𝑊𝑒𝑖𝑔ℎ𝑡𝑠
H
G*H
H
Shared Memory
ℎ 𝑡−1
result
NN
H
G*H
GRU P4 - FP32, batch_size = 1
*Can add more work in this instance
Other
variables
H N RF Usage SMEM
100 1 3∗100∗100∗4
256∗1024
≅ 46%
100+3∗100 ∗4
96∗1024
≅ 2%
20 1 3∗20∗20∗4
256∗1024
≅ 2% 20+3∗20 ∗4
96∗1024
≪ 1%
Summary
Significant gain from deep learning
in search, speech, vision and
machine reading comprehension.
Large scale and low latency
inference and vector search service
in production
Heterogenous hardware and
pluggable framework support
Deep Learning Inference at speed and scale

More Related Content

What's hot

Heuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient searchHeuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient search
Greg Makowski
 
How Artificial Intelligence & Machine Learning Are Transforming Modern Marketing
How Artificial Intelligence & Machine Learning Are Transforming Modern MarketingHow Artificial Intelligence & Machine Learning Are Transforming Modern Marketing
How Artificial Intelligence & Machine Learning Are Transforming Modern Marketing
CleverTap
 
Deep Learning Enabled Question Answering System to Automate Corporate Helpdesk
Deep Learning Enabled Question Answering System to Automate Corporate HelpdeskDeep Learning Enabled Question Answering System to Automate Corporate Helpdesk
Deep Learning Enabled Question Answering System to Automate Corporate Helpdesk
Saurabh Saxena
 
Rasa NLU and ML Interpretability
Rasa NLU and ML InterpretabilityRasa NLU and ML Interpretability
Rasa NLU and ML Interpretability
ztopol
 
Statistical Models for Massive Web Data
Statistical Models for Massive Web DataStatistical Models for Massive Web Data
Statistical Models for Massive Web Data
Deepak Agarwal
 
How Artificial Intelligence & Machine Learning Are Transforming Modern Market...
How Artificial Intelligence & Machine Learning Are Transforming Modern Market...How Artificial Intelligence & Machine Learning Are Transforming Modern Market...
How Artificial Intelligence & Machine Learning Are Transforming Modern Market...
DigiMarCon - Digital Marketing, Media and Advertising Conferences & Exhibitions
 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learning
Tamir Taha
 
Predictive Model and Record Description with Segmented Sensitivity Analysis (...
Predictive Model and Record Description with Segmented Sensitivity Analysis (...Predictive Model and Record Description with Segmented Sensitivity Analysis (...
Predictive Model and Record Description with Segmented Sensitivity Analysis (...
Greg Makowski
 
Big Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-onBig Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-on
Dony Riyanto
 
Data Science, Machine Learning and Neural Networks
Data Science, Machine Learning and Neural NetworksData Science, Machine Learning and Neural Networks
Data Science, Machine Learning and Neural Networks
BICA Labs
 
10 Lessons Learned from Building Machine Learning Systems
10 Lessons Learned from Building Machine Learning Systems10 Lessons Learned from Building Machine Learning Systems
10 Lessons Learned from Building Machine Learning Systems
Xavier Amatriain
 
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
Alok Singh
 
Deep Learning Models for Question Answering
Deep Learning Models for Question AnsweringDeep Learning Models for Question Answering
Deep Learning Models for Question Answering
Sujit Pal
 
Barga ACM DEBS 2013 Keynote
Barga ACM DEBS 2013 KeynoteBarga ACM DEBS 2013 Keynote
Barga ACM DEBS 2013 Keynote
Roger Barga
 
Towards a Comprehensive Machine Learning Benchmark
Towards a Comprehensive Machine Learning BenchmarkTowards a Comprehensive Machine Learning Benchmark
Towards a Comprehensive Machine Learning Benchmark
Turi, Inc.
 
Creating AnswerBot with Keras and TensorFlow (TensorBeat)
Creating AnswerBot with Keras and TensorFlow (TensorBeat)Creating AnswerBot with Keras and TensorFlow (TensorBeat)
Creating AnswerBot with Keras and TensorFlow (TensorBeat)
Avkash Chauhan
 
Conformer-Kernel with Query Term Independence @ TREC 2020 Deep Learning Track
Conformer-Kernel with Query Term Independence @ TREC 2020 Deep Learning TrackConformer-Kernel with Query Term Independence @ TREC 2020 Deep Learning Track
Conformer-Kernel with Query Term Independence @ TREC 2020 Deep Learning Track
Bhaskar Mitra
 
Feature Engineering
Feature Engineering Feature Engineering
Feature Engineering
odsc
 
Dato Keynote
Dato KeynoteDato Keynote
Dato Keynote
Turi, Inc.
 
Learning deep structured semantic models for web search
Learning deep structured semantic models for web searchLearning deep structured semantic models for web search
Learning deep structured semantic models for web search
hyunsung lee
 

What's hot (20)

Heuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient searchHeuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient search
 
How Artificial Intelligence & Machine Learning Are Transforming Modern Marketing
How Artificial Intelligence & Machine Learning Are Transforming Modern MarketingHow Artificial Intelligence & Machine Learning Are Transforming Modern Marketing
How Artificial Intelligence & Machine Learning Are Transforming Modern Marketing
 
Deep Learning Enabled Question Answering System to Automate Corporate Helpdesk
Deep Learning Enabled Question Answering System to Automate Corporate HelpdeskDeep Learning Enabled Question Answering System to Automate Corporate Helpdesk
Deep Learning Enabled Question Answering System to Automate Corporate Helpdesk
 
Rasa NLU and ML Interpretability
Rasa NLU and ML InterpretabilityRasa NLU and ML Interpretability
Rasa NLU and ML Interpretability
 
Statistical Models for Massive Web Data
Statistical Models for Massive Web DataStatistical Models for Massive Web Data
Statistical Models for Massive Web Data
 
How Artificial Intelligence & Machine Learning Are Transforming Modern Market...
How Artificial Intelligence & Machine Learning Are Transforming Modern Market...How Artificial Intelligence & Machine Learning Are Transforming Modern Market...
How Artificial Intelligence & Machine Learning Are Transforming Modern Market...
 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learning
 
Predictive Model and Record Description with Segmented Sensitivity Analysis (...
Predictive Model and Record Description with Segmented Sensitivity Analysis (...Predictive Model and Record Description with Segmented Sensitivity Analysis (...
Predictive Model and Record Description with Segmented Sensitivity Analysis (...
 
Big Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-onBig Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-on
 
Data Science, Machine Learning and Neural Networks
Data Science, Machine Learning and Neural NetworksData Science, Machine Learning and Neural Networks
Data Science, Machine Learning and Neural Networks
 
10 Lessons Learned from Building Machine Learning Systems
10 Lessons Learned from Building Machine Learning Systems10 Lessons Learned from Building Machine Learning Systems
10 Lessons Learned from Building Machine Learning Systems
 
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
 
Deep Learning Models for Question Answering
Deep Learning Models for Question AnsweringDeep Learning Models for Question Answering
Deep Learning Models for Question Answering
 
Barga ACM DEBS 2013 Keynote
Barga ACM DEBS 2013 KeynoteBarga ACM DEBS 2013 Keynote
Barga ACM DEBS 2013 Keynote
 
Towards a Comprehensive Machine Learning Benchmark
Towards a Comprehensive Machine Learning BenchmarkTowards a Comprehensive Machine Learning Benchmark
Towards a Comprehensive Machine Learning Benchmark
 
Creating AnswerBot with Keras and TensorFlow (TensorBeat)
Creating AnswerBot with Keras and TensorFlow (TensorBeat)Creating AnswerBot with Keras and TensorFlow (TensorBeat)
Creating AnswerBot with Keras and TensorFlow (TensorBeat)
 
Conformer-Kernel with Query Term Independence @ TREC 2020 Deep Learning Track
Conformer-Kernel with Query Term Independence @ TREC 2020 Deep Learning TrackConformer-Kernel with Query Term Independence @ TREC 2020 Deep Learning Track
Conformer-Kernel with Query Term Independence @ TREC 2020 Deep Learning Track
 
Feature Engineering
Feature Engineering Feature Engineering
Feature Engineering
 
Dato Keynote
Dato KeynoteDato Keynote
Dato Keynote
 
Learning deep structured semantic models for web search
Learning deep structured semantic models for web searchLearning deep structured semantic models for web search
Learning deep structured semantic models for web search
 

Similar to Deep Learning Inference at speed and scale

Javantura v4 - Java and lambdas and streams - are they better than for loops ...
Javantura v4 - Java and lambdas and streams - are they better than for loops ...Javantura v4 - Java and lambdas and streams - are they better than for loops ...
Javantura v4 - Java and lambdas and streams - are they better than for loops ...
HUJAK - Hrvatska udruga Java korisnika / Croatian Java User Association
 
Making Machine Learning Scale: Single Machine and Distributed
Making Machine Learning Scale: Single Machine and DistributedMaking Machine Learning Scale: Single Machine and Distributed
Making Machine Learning Scale: Single Machine and Distributed
Turi, Inc.
 
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
MLconf
 
Blinkdb
BlinkdbBlinkdb
Blinkdb
Nitish Upreti
 
Memory efficient java tutorial practices and challenges
Memory efficient java tutorial practices and challengesMemory efficient java tutorial practices and challenges
Memory efficient java tutorial practices and challenges
mustafa sarac
 
Presto at Tivo, Boston Hadoop Meetup
Presto at Tivo, Boston Hadoop MeetupPresto at Tivo, Boston Hadoop Meetup
Presto at Tivo, Boston Hadoop Meetup
Justin Borgman
 
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)
AI optimizing HPC simulations (presentation from  6th EULAG Workshop)AI optimizing HPC simulations (presentation from  6th EULAG Workshop)
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)
byteLAKE
 
SnappyData, the Spark Database. A unified cluster for streaming, transactions...
SnappyData, the Spark Database. A unified cluster for streaming, transactions...SnappyData, the Spark Database. A unified cluster for streaming, transactions...
SnappyData, the Spark Database. A unified cluster for streaming, transactions...
SnappyData
 
SnappyData at Spark Summit 2017
SnappyData at Spark Summit 2017SnappyData at Spark Summit 2017
SnappyData at Spark Summit 2017
Jags Ramnarayan
 
Driving Moore's Law with Python-Powered Machine Learning: An Insider's Perspe...
Driving Moore's Law with Python-Powered Machine Learning: An Insider's Perspe...Driving Moore's Law with Python-Powered Machine Learning: An Insider's Perspe...
Driving Moore's Law with Python-Powered Machine Learning: An Insider's Perspe...
PyData
 
AI and Deep Learning
AI and Deep Learning AI and Deep Learning
AI and Deep Learning
Subrat Panda, PhD
 
Predicting Optimal Parallelism for Data Analytics
Predicting Optimal Parallelism for Data AnalyticsPredicting Optimal Parallelism for Data Analytics
Predicting Optimal Parallelism for Data Analytics
Databricks
 
Keynote at IWLS 2017
Keynote at IWLS 2017Keynote at IWLS 2017
Keynote at IWLS 2017
Manish Pandey
 
Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010
Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010
Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010
ivan provalov
 
Ernest: Efficient Performance Prediction for Advanced Analytics on Apache Spa...
Ernest: Efficient Performance Prediction for Advanced Analytics on Apache Spa...Ernest: Efficient Performance Prediction for Advanced Analytics on Apache Spa...
Ernest: Efficient Performance Prediction for Advanced Analytics on Apache Spa...
Spark Summit
 
Multidimensional Interfaces for Selecting Data with Order
Multidimensional Interfaces for Selecting Data with OrderMultidimensional Interfaces for Selecting Data with Order
Multidimensional Interfaces for Selecting Data with Order
Ruben Taelman
 
Advanced Data Science on Spark-(Reza Zadeh, Stanford)
Advanced Data Science on Spark-(Reza Zadeh, Stanford)Advanced Data Science on Spark-(Reza Zadeh, Stanford)
Advanced Data Science on Spark-(Reza Zadeh, Stanford)
Spark Summit
 
Two methods for optimising cognitive model parameters
Two methods for optimising cognitive model parametersTwo methods for optimising cognitive model parameters
Two methods for optimising cognitive model parameters
University of Huddersfield
 
ESWC2015 - Query Optimization for Clients of Linked Data Fragments
ESWC2015 - Query Optimization for Clients of Linked Data FragmentsESWC2015 - Query Optimization for Clients of Linked Data Fragments
ESWC2015 - Query Optimization for Clients of Linked Data Fragments
Joachim Van Herwegen
 
Deep Dive on Deep Learning (June 2018)
Deep Dive on Deep Learning (June 2018)Deep Dive on Deep Learning (June 2018)
Deep Dive on Deep Learning (June 2018)
Julien SIMON
 

Similar to Deep Learning Inference at speed and scale (20)

Javantura v4 - Java and lambdas and streams - are they better than for loops ...
Javantura v4 - Java and lambdas and streams - are they better than for loops ...Javantura v4 - Java and lambdas and streams - are they better than for loops ...
Javantura v4 - Java and lambdas and streams - are they better than for loops ...
 
Making Machine Learning Scale: Single Machine and Distributed
Making Machine Learning Scale: Single Machine and DistributedMaking Machine Learning Scale: Single Machine and Distributed
Making Machine Learning Scale: Single Machine and Distributed
 
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon Uni...
 
Blinkdb
BlinkdbBlinkdb
Blinkdb
 
Memory efficient java tutorial practices and challenges
Memory efficient java tutorial practices and challengesMemory efficient java tutorial practices and challenges
Memory efficient java tutorial practices and challenges
 
Presto at Tivo, Boston Hadoop Meetup
Presto at Tivo, Boston Hadoop MeetupPresto at Tivo, Boston Hadoop Meetup
Presto at Tivo, Boston Hadoop Meetup
 
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)
AI optimizing HPC simulations (presentation from  6th EULAG Workshop)AI optimizing HPC simulations (presentation from  6th EULAG Workshop)
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)
 
SnappyData, the Spark Database. A unified cluster for streaming, transactions...
SnappyData, the Spark Database. A unified cluster for streaming, transactions...SnappyData, the Spark Database. A unified cluster for streaming, transactions...
SnappyData, the Spark Database. A unified cluster for streaming, transactions...
 
SnappyData at Spark Summit 2017
SnappyData at Spark Summit 2017SnappyData at Spark Summit 2017
SnappyData at Spark Summit 2017
 
Driving Moore's Law with Python-Powered Machine Learning: An Insider's Perspe...
Driving Moore's Law with Python-Powered Machine Learning: An Insider's Perspe...Driving Moore's Law with Python-Powered Machine Learning: An Insider's Perspe...
Driving Moore's Law with Python-Powered Machine Learning: An Insider's Perspe...
 
AI and Deep Learning
AI and Deep Learning AI and Deep Learning
AI and Deep Learning
 
Predicting Optimal Parallelism for Data Analytics
Predicting Optimal Parallelism for Data AnalyticsPredicting Optimal Parallelism for Data Analytics
Predicting Optimal Parallelism for Data Analytics
 
Keynote at IWLS 2017
Keynote at IWLS 2017Keynote at IWLS 2017
Keynote at IWLS 2017
 
Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010
Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010
Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010
 
Ernest: Efficient Performance Prediction for Advanced Analytics on Apache Spa...
Ernest: Efficient Performance Prediction for Advanced Analytics on Apache Spa...Ernest: Efficient Performance Prediction for Advanced Analytics on Apache Spa...
Ernest: Efficient Performance Prediction for Advanced Analytics on Apache Spa...
 
Multidimensional Interfaces for Selecting Data with Order
Multidimensional Interfaces for Selecting Data with OrderMultidimensional Interfaces for Selecting Data with Order
Multidimensional Interfaces for Selecting Data with Order
 
Advanced Data Science on Spark-(Reza Zadeh, Stanford)
Advanced Data Science on Spark-(Reza Zadeh, Stanford)Advanced Data Science on Spark-(Reza Zadeh, Stanford)
Advanced Data Science on Spark-(Reza Zadeh, Stanford)
 
Two methods for optimising cognitive model parameters
Two methods for optimising cognitive model parametersTwo methods for optimising cognitive model parameters
Two methods for optimising cognitive model parameters
 
ESWC2015 - Query Optimization for Clients of Linked Data Fragments
ESWC2015 - Query Optimization for Clients of Linked Data FragmentsESWC2015 - Query Optimization for Clients of Linked Data Fragments
ESWC2015 - Query Optimization for Clients of Linked Data Fragments
 
Deep Dive on Deep Learning (June 2018)
Deep Dive on Deep Learning (June 2018)Deep Dive on Deep Learning (June 2018)
Deep Dive on Deep Learning (June 2018)
 

More from Bill Liu

Walk Through a Real World ML Production Project
Walk Through a Real World ML Production ProjectWalk Through a Real World ML Production Project
Walk Through a Real World ML Production Project
Bill Liu
 
Redefining MLOps with Model Deployment, Management and Observability in Produ...
Redefining MLOps with Model Deployment, Management and Observability in Produ...Redefining MLOps with Model Deployment, Management and Observability in Produ...
Redefining MLOps with Model Deployment, Management and Observability in Produ...
Bill Liu
 
Productizing Machine Learning at the Edge
Productizing Machine Learning at the EdgeProductizing Machine Learning at the Edge
Productizing Machine Learning at the Edge
Bill Liu
 
Transformers in Vision: From Zero to Hero
Transformers in Vision: From Zero to HeroTransformers in Vision: From Zero to Hero
Transformers in Vision: From Zero to Hero
Bill Liu
 
Deep AutoViML For Tensorflow Models and MLOps Workflows
Deep AutoViML For Tensorflow Models and MLOps WorkflowsDeep AutoViML For Tensorflow Models and MLOps Workflows
Deep AutoViML For Tensorflow Models and MLOps Workflows
Bill Liu
 
Metaflow: The ML Infrastructure at Netflix
Metaflow: The ML Infrastructure at NetflixMetaflow: The ML Infrastructure at Netflix
Metaflow: The ML Infrastructure at Netflix
Bill Liu
 
Practical Crowdsourcing for ML at Scale
Practical Crowdsourcing for ML at ScalePractical Crowdsourcing for ML at Scale
Practical Crowdsourcing for ML at Scale
Bill Liu
 
Building large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudiBuilding large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudi
Bill Liu
 
Deep Reinforcement Learning and Its Applications
Deep Reinforcement Learning and Its ApplicationsDeep Reinforcement Learning and Its Applications
Deep Reinforcement Learning and Its Applications
Bill Liu
 
Big Data and AI in Fighting Against COVID-19
Big Data and AI in Fighting Against COVID-19Big Data and AI in Fighting Against COVID-19
Big Data and AI in Fighting Against COVID-19
Bill Liu
 
Highly-scalable Reinforcement Learning RLlib for Real-world Applications
Highly-scalable Reinforcement Learning RLlib for Real-world ApplicationsHighly-scalable Reinforcement Learning RLlib for Real-world Applications
Highly-scalable Reinforcement Learning RLlib for Real-world Applications
Bill Liu
 
Build computer vision models to perform object detection and classification w...
Build computer vision models to perform object detection and classification w...Build computer vision models to perform object detection and classification w...
Build computer vision models to perform object detection and classification w...
Bill Liu
 
Causal Inference in Data Science and Machine Learning
Causal Inference in Data Science and Machine LearningCausal Inference in Data Science and Machine Learning
Causal Inference in Data Science and Machine Learning
Bill Liu
 
Weekly #106: Deep Learning on Mobile
Weekly #106: Deep Learning on MobileWeekly #106: Deep Learning on Mobile
Weekly #106: Deep Learning on Mobile
Bill Liu
 
Weekly #105: AutoViz and Auto_ViML Visualization and Machine Learning
Weekly #105: AutoViz and Auto_ViML Visualization and Machine LearningWeekly #105: AutoViz and Auto_ViML Visualization and Machine Learning
Weekly #105: AutoViz and Auto_ViML Visualization and Machine Learning
Bill Liu
 
AISF19 - On Blending Machine Learning with Microeconomics
AISF19 - On Blending Machine Learning with MicroeconomicsAISF19 - On Blending Machine Learning with Microeconomics
AISF19 - On Blending Machine Learning with Microeconomics
Bill Liu
 
AISF19 - Travel in the AI-First World
AISF19 - Travel in the AI-First WorldAISF19 - Travel in the AI-First World
AISF19 - Travel in the AI-First World
Bill Liu
 
AISF19 - Unleash Computer Vision at the Edge
AISF19 - Unleash Computer Vision at the EdgeAISF19 - Unleash Computer Vision at the Edge
AISF19 - Unleash Computer Vision at the Edge
Bill Liu
 
AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...
AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...
AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...
Bill Liu
 
Toronto meetup 20190917
Toronto meetup 20190917Toronto meetup 20190917
Toronto meetup 20190917
Bill Liu
 

More from Bill Liu (20)

Walk Through a Real World ML Production Project
Walk Through a Real World ML Production ProjectWalk Through a Real World ML Production Project
Walk Through a Real World ML Production Project
 
Redefining MLOps with Model Deployment, Management and Observability in Produ...
Redefining MLOps with Model Deployment, Management and Observability in Produ...Redefining MLOps with Model Deployment, Management and Observability in Produ...
Redefining MLOps with Model Deployment, Management and Observability in Produ...
 
Productizing Machine Learning at the Edge
Productizing Machine Learning at the EdgeProductizing Machine Learning at the Edge
Productizing Machine Learning at the Edge
 
Transformers in Vision: From Zero to Hero
Transformers in Vision: From Zero to HeroTransformers in Vision: From Zero to Hero
Transformers in Vision: From Zero to Hero
 
Deep AutoViML For Tensorflow Models and MLOps Workflows
Deep AutoViML For Tensorflow Models and MLOps WorkflowsDeep AutoViML For Tensorflow Models and MLOps Workflows
Deep AutoViML For Tensorflow Models and MLOps Workflows
 
Metaflow: The ML Infrastructure at Netflix
Metaflow: The ML Infrastructure at NetflixMetaflow: The ML Infrastructure at Netflix
Metaflow: The ML Infrastructure at Netflix
 
Practical Crowdsourcing for ML at Scale
Practical Crowdsourcing for ML at ScalePractical Crowdsourcing for ML at Scale
Practical Crowdsourcing for ML at Scale
 
Building large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudiBuilding large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudi
 
Deep Reinforcement Learning and Its Applications
Deep Reinforcement Learning and Its ApplicationsDeep Reinforcement Learning and Its Applications
Deep Reinforcement Learning and Its Applications
 
Big Data and AI in Fighting Against COVID-19
Big Data and AI in Fighting Against COVID-19Big Data and AI in Fighting Against COVID-19
Big Data and AI in Fighting Against COVID-19
 
Highly-scalable Reinforcement Learning RLlib for Real-world Applications
Highly-scalable Reinforcement Learning RLlib for Real-world ApplicationsHighly-scalable Reinforcement Learning RLlib for Real-world Applications
Highly-scalable Reinforcement Learning RLlib for Real-world Applications
 
Build computer vision models to perform object detection and classification w...
Build computer vision models to perform object detection and classification w...Build computer vision models to perform object detection and classification w...
Build computer vision models to perform object detection and classification w...
 
Causal Inference in Data Science and Machine Learning
Causal Inference in Data Science and Machine LearningCausal Inference in Data Science and Machine Learning
Causal Inference in Data Science and Machine Learning
 
Weekly #106: Deep Learning on Mobile
Weekly #106: Deep Learning on MobileWeekly #106: Deep Learning on Mobile
Weekly #106: Deep Learning on Mobile
 
Weekly #105: AutoViz and Auto_ViML Visualization and Machine Learning
Weekly #105: AutoViz and Auto_ViML Visualization and Machine LearningWeekly #105: AutoViz and Auto_ViML Visualization and Machine Learning
Weekly #105: AutoViz and Auto_ViML Visualization and Machine Learning
 
AISF19 - On Blending Machine Learning with Microeconomics
AISF19 - On Blending Machine Learning with MicroeconomicsAISF19 - On Blending Machine Learning with Microeconomics
AISF19 - On Blending Machine Learning with Microeconomics
 
AISF19 - Travel in the AI-First World
AISF19 - Travel in the AI-First WorldAISF19 - Travel in the AI-First World
AISF19 - Travel in the AI-First World
 
AISF19 - Unleash Computer Vision at the Edge
AISF19 - Unleash Computer Vision at the EdgeAISF19 - Unleash Computer Vision at the Edge
AISF19 - Unleash Computer Vision at the Edge
 
AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...
AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...
AISF19 - Building Scalable, Kubernetes-Native ML/AI Pipelines with TFX, KubeF...
 
Toronto meetup 20190917
Toronto meetup 20190917Toronto meetup 20190917
Toronto meetup 20190917
 

Recently uploaded

みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
Rohit Gautam
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
ThomasParaiso2
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 

Recently uploaded (20)

みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 

Deep Learning Inference at speed and scale

  • 1.
  • 4.
  • 5. Applications (0.78, 0.8, 0.4, 0.3, 0.9,...) (0.75, 0.6, 0.1, 0.7, 0.2,...) … … • Semantic similarity Vector Representation Nearest Neighbor Search in Semantic Space Q: {is it legal for 17 year old to buy a car} Bag of Words Inverted Index Matching car legal … OR AND buy own • L1: BM25F Ranking Posting 1 Posting 2 Posting 3 Posting 4 … buy legal … L2/L3/L4 ReRanking
  • 6. Semantic search can help recall issues, nearly a third of relevance DSATs. Query: {how many women voices in Switchboard telephone corpus } Cannot recall the good urls by query term alteration and term match DL model captures full context, and builds semantic meanings into vectors. The query vector and document vector are near in vector space. Applications
  • 8. Query: Where's the nearest fruit smoothies Location: Omaha, Nebraska Applications
  • 9. Framework Deep Learning Platform DLIS Pluggable Runtime Linux ContainerNative Windows Microsoft CNTK TensorFlowWin DeepCPU TensorFlowLinux Caffe Hardware Accerlation CPU GPUFPGA Self-Serve Portal Model Development toolkit Model Repository Theano ... Workloads Web Text Speech Image Enterprise DLVS Pluggable Runtime HNSW K-D Tree Faiss ANN Index Build on Multi-Tenancy FrontDoor DLIS DLVS • Customizable runtime • Privacy and Compliance Certification • In production globally
  • 10. • • • 1Ms QPS, 100s models, 100Bs vectors, 20+ Regions Framework
  • 11. Optimal distribution to match model requirements to server fleet Framework Windows Machine SKU-2 Model2 Model2 Linux Machine SKU-4 Windows Machine SKU-1 Model1 Model1 Linux Machine SKU-5 Model6 Windows Machine SKU-3 Model6 Model6 Model6 Model2 Model2 Windows Machine SKU-2 Model2 Model2 Windows Machine SKU-2 Model3 Model4 Model5 Model2 Model2 Model1 Model1 Model1 Model1 Model1 Model1 Model1 Model1 Model1 Model4 Model5 Multiple model instances across multiple machines Multiple model instances share same machine Different Operating System in same bed Different runtime Model7 Linux Machine SKU-5 Model8Model7 Model7 CNTK TensorFlow Windows DeepCPU TensorFlow Linux Different machine SKU in same bed
  • 13. Vector Recall by Nearest Neighbor Search Search among points in bucket Hash query to this bucket NNG HNSW KD-tree Semantic word 1 Semantic word 2 Semantic word 3 TP-tree & Wang, Jingdong, and Shipeng Li. "Query-driven iterated neighborhood graph search for large scale indexing." Proceedings of the 20th ACM international conference on Multimedia. ACM, 2012. Framework
  • 15. RNN Serving Performance Challenges Language Modeling Machine Translation Machine Reading Comprehension Conversation Bot Speech Recognition … Limited Parallelism Limited Bandwidth • Small batch size • Sequential dependency • Vector-matrix multiplication • Low data reuse 14 Xt-1 Xt Xt+1 Ot-1 Ot Ot+1 St-1 St St+1 W W W U U U V V V Optimization
  • 16. 1. Matrix computation: 2. Activation function 3. Operation Fusing 4. Affinity 5. Locality 6. Parallelism 7. Task scheduling Collaborating with Yuxiong He, Minjia Zhang, Samyam Rajbhandari, Wenhan Wang, Microsoft AI and Research. Optimization
  • 17. 𝑧𝑡 = 𝜎 𝑊𝑧 𝑥 𝑡 + 𝑈𝑧ℎ 𝑡−1 + 𝑏 𝑧 𝑟𝑡 = 𝜎 𝑊𝑟 𝑥 𝑡 + 𝑈𝑟ℎ 𝑡−1 + 𝑏 𝑟 ℎ 𝑡 = 𝑧𝑡 ∘ ℎ 𝑡−1 + 1 − 𝑧𝑡 ∘ tanh(𝑊ℎ 𝑥 𝑡 + 𝑈ℎ 𝑟𝑡 ∘ ℎ 𝑡−1 + 𝑏ℎ) On a machine with 12 cores… a) 1 core per operation, multiplications done in parallel 1 1 1 1 1 1 time cores 6 12 b) 12 cores per operation, multiplications done sequentially 12 12 12 12 12 12 6 12 cores time many idle cores unbalanced load poor speedup of intra-op parallelism Optimization
  • 18. Optimization 𝑧𝑡 = 𝜎 𝑊𝑧 𝑥 𝑡 + 𝑈𝑧ℎ 𝑡−1 + 𝑏 𝑧 𝑟𝑡 = 𝜎 𝑊𝑟 𝑥 𝑡 + 𝑈𝑟ℎ 𝑡−1 + 𝑏 𝑟 ℎ 𝑡 = 𝑧𝑡 ∘ ℎ 𝑡−1 + 1 − 𝑧𝑡 ∘ tanh(𝑊ℎ 𝑥 𝑡 + 𝑈ℎ 𝑟𝑡 ∘ ℎ 𝑡−1 + 𝑏ℎ) On a machine with 12 cores… d) an optimized configuration, reducing latency 6 12 cores time 2 2 3 3 2 6 c) 4 cores per operation 4 4 4 4 4 4 time cores 6 12 1 1 1 2 2 2 1 2 Bad scheduling order ✓ Workload size ✓ Parallelism efficiency ✓ Critical path ✓ Load balancing
  • 23. DL Scenarios Original Latency Latency Target Optimized Latency Latency reduction Throughput improvement Turing Prototype 2 ~100ms 10ms 9ms >10X > 10X Turing Prototype 3 ~107ms 10ms 4.1ms >20X > 50X Deep Query Document Similarity 10~12ms for [query, 1 doc] x 33 docs 6ms 1.5ms for [query, 1 doc]; <6ms for [query, 33 docs] >6X > 30X Malta Click Features 10ms for [query, 1 passage] x 150 passages 5ms <1ms for [query, 1 passage]; <5ms for [query, 150 passages] >10X > 100X Ads seq2seq model for query rewriting 51ms 5ms 4ms >10X > 3X AGI Encoder V2 ~29ms 10ms 5.4ms 5X 5X RNet (InfoBot + Bing) ~45ms for 1 [query, passage] 10ms 4.0ms for 1 [query, passage]; <8.5ms for 20 [query, passage] 11X > 100X Bing query tagging 9~16ms on CNTK 3ms 0.95ms 10X > 10X WideDeepRight Model (TP3 L1) ~25ms for [query, 1 title url] 7ms for a batch size of 33 5.4ms for [query, 33 title url]; 10X > 100X TP3 L2 Classifier 60ms 3ms 3ms 20X 20X TP3 L1 8ms 3ms 1ms 8X 8X Optimization
  • 25. 24 Original TensorFlow model TensorFlow model with DeepCPU operator Optimization
  • 26. F F F L0 L1 F F F L0 Pretrained DNN Model in TF/CNTK/ONNX, etc. Scalable DNN Hardware Microservice BrainWave Soft DPU Instr Decoder & Control Neural FU Network switches FPGAs Optimization
  • 28. Production Bing DNN Model 1 CPU only Brainwave accelerated Improvement Model Details GRU 128X200 (X2) + W2Vec LSTM 500X200 (x8) +W2Vec Brainwave accelerated mode is > 10X larger and > 10X lower latencyEnd-to-End latency per Batch 1 request at 95% 9ms 0.85ms Production Bing DNN Model 2 CPU only Brainwave accelerated Improvement Model Details 1D CNN + W2Vec (RNNs removed) 1D CNN + W2Vec + GRU 500x500 (x4) Brainwave accelerated mode is > 10X larger and 3X lower latency End-to-End latency per Batch 1 request at 95% 15ms 5ms Optimization
  • 29. Layer GEMM 𝑊𝑖 𝑊𝑓 𝑊𝑜 𝑊𝑐 G*H S 𝑥 𝑡S N G*H N Recurrent GEMM 𝑈𝑖 𝑈𝑓 𝑈 𝑜 𝑈𝑐 H ℎ 𝑡−1H G*H N N G*H S = synthetic_dim H = hidden_dim N = batch_size G = num_gates Optimization
  • 30. Optimization RF 𝑊𝑒𝑖𝑔ℎ𝑡𝑠 H G*H H Shared Memory ℎ 𝑡−1 result NN H G*H GRU P4 - FP32, batch_size = 1 *Can add more work in this instance Other variables H N RF Usage SMEM 100 1 3∗100∗100∗4 256∗1024 ≅ 46% 100+3∗100 ∗4 96∗1024 ≅ 2% 20 1 3∗20∗20∗4 256∗1024 ≅ 2% 20+3∗20 ∗4 96∗1024 ≪ 1%
  • 31. Summary Significant gain from deep learning in search, speech, vision and machine reading comprehension. Large scale and low latency inference and vector search service in production Heterogenous hardware and pluggable framework support