SlideShare a Scribd company logo
1 of 45
Download to read offline
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Elastic Inference – Reduce Deep
Learning inference costs by 75%
A I M 3 6 6
Dominic Divakaruni
Sr. Product Manager
Sudipta Sengupta
Sr. Principal Technologist
AWS – Machine Learning
Peter Jones
Head of AI Engineering
Liviu Calin
AI Systems Engineer
Autodesk AI Lab
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Agenda
❖ Challenges scaling deep learning applications
❖ Our solution that addresses the cost efficiency and flexibility
challenges.
❖ Share Autodesk’s experience
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Machine learning – the centerpiece for transformation
Customer
experience
Business
operations
Decision-
making Innovation
Competitive
advantage
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Inference
(Prediction)
90%
Training
10%
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
The challenges of inference in production
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
A closer look at GPU utilization for inference
0
100
200
300
400
500
600
700
800
900
1000
1 2 3 4 5 6 7
90% underutilized
for single batch size
inference
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
0
50
100
150
200
1 2 3 4 5 6
More sessions doesn’t solve the problem
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
How cost effective are GPU instances for inference?
Smaller P2 instances are more effective for real time inference with small batch sizes
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
How cost effective are GPU instancesfor inference?
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
How do we optimize resources and reduce costs?
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
How do we optimize resources and reduce costs?
Introducing
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Amazon Elastic Inference
Integrated with
Amazon EC2 and
Amazon SageMaker
Support for TensorFlow, Apache
MXNet, and ONNX
with PyTorch coming soon
Single and
mixed-precision
operations
Reduce deep learning inference costs up to 75%
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Acceleration sizes tailored for inference
Accelerator
Type
FP32
Throughput
(TOPS)
FP16
Throughput
(TOPS)
Accelerator
Memory
(GB)
Price ($/hr)
(US)
eia1.medium 1 8 1 $0.13
eia1.large 2 16 2 $0.26
eia1.xlarge 4 32 4 $0.52
Now available in N. Virginia, Ohio, Oregon, Dublin, Tokyo, and Seoul
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Inference Performance with EI and GPU
0
20
40
60
80
100
120
0
10
20
30
40
50
60
70
0
20
40
60
80
100
120
140
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
How does Elastic Inference work with Amazon EC2?
VPC
Region
Availability Zone
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Scale capacity in EC2 Auto Scaling groups
Auto Scaling group
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
How does Elastic Inference work with SageMaker?
SageMaker Notebooks
SageMaker Hosted Endpoints
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Model Support
ONNX
Amazon EI enabled
TensorFlow Serving
Amazon EI
enabled Apache
MXNet
Applied using
Apache MXNet
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Loading models and serving requests
AmazonEI_TensorFlow_Serving_v1.11_v1 --model_name=inception --
model_base_path=[model location] --port=9000
python inception_client.py --server=localhost:9000 --image
Siberian_Husky_bi-eyed_Flickr.jpg
TensorFlow models using Amazon EI enabled TensorFlow serving
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Loading models and serving requests
Load MXNet models using Amazon EI enabled Apache MXNet
# For ONNX models use MXNet’s import.model API as follows:
sym, arg, aux = onnx_mxnet.import_model(onnx_model_file)
# Pass mx.eia() as context while creating Module object
mod = mx.mod.Module(symbol=sym, context=mx.eia())
Load ONNX models using Amazon EI enabled Apache MXNet
# example for MXNet models
sym, arg_params, aux_params = mx.model.load_checkpoint('resnet-152', 0)
mod = mx.mod.Module(symbol=sym, context=mx.eia(), label_names=None)
mod.bind(for_training=False, data_shapes=[('data', (1,3,224,224))],
label_shapes=mod._label_shapes)
mod.set_params(arg_params, aux_params, allow_missing=True)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
How to choose?
Considerations as you choose an instance and accelerator type combination
for your model:
➢ What is your target latency SLA for your application, and what are you
constraints?
➢ Start small and size up if you need more capacity.
➢ Input/output data payload has an impact on latency.
➢ Convert to Fp16 for lower latency and higher throughput.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Peter Jones
Head of AI Engineering
Autodesk AI Lab
Liviu-Mihai Calin
AI Systems Engineer
Autodesk AI Lab
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
MORE
IS INEVITABLE
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
LESS
IS A REALITY
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Image courtesy of Tesla Motors, Inc. Image courtesy of Gensler.
The Martian © 2015 Twentieth Century Fox. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
OPPORTUNITY OF
BETTER
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
AI LAB
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Softmax Classifier
Embedding
2d-conv 2d-conv 2d-conv 2d-conv 2d-conv batch-max dense dense dense
Multi-view Convolutional Neural Network MVCNN
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Variational Autoencoder (VAE)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Instance Setup with Elastic Inference
aws ec2 run-instances
--image-id <preconfigured_ami_id>
--instance-type <ec2_instance_type>
--key-name <key_name>
--subnet-id <subnet_id>
--security-group-ids <security_group_id
--iam-instance-profile Name=”iam_profile_name”
--elastic-inference-accelerator Type=eia1.<size>
• Just like setting up a normal EC2 instance
• Create instance with preconfigured AMI and reference to accelerator
• A VPC endpoint to allow EC2 instance to connect to accelerator (done once)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Using Elastic Inference
• Serve saved model with EI version of TensorFlow model server
• Send requests to the server to predict with test data
• Elastic inference takes care of accelerating the operations
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Creating a Saved Model
classifier = tf.estimator.Estimator(…)
input_tensor = tf.placeholder(dtype=tf.float32,
shape=[1, 80, 128, 128, 1],
name='images_tensor’)
input_map = {'images’ : input_tensor}
classifier.export_savedmodel(model_dir,
tf.estimator.export.build_raw_serving_input_receiver_fn(input_map))
• The MVCNN model is in TF Estimator format and has been trained
• It expects grayscale multi-view images named “images” as input
• Dimensions: [batch_size , num_views , width , height , color_channels]
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Predicting with EI TensorFlow Serving
AmazonEI_TensorFlow_Serving_v1.11_v1 --model_name=mvcnn 
--model_base_path=model_dir --port=9000
• Have one process serve the previously exported saved model
• Have another process send requests containing input data
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Predicting with EI TensorFlow Serving
tf.app.flags.DEFINE_string('server', 'localhost:9000',
'PredictionService host:port’)
FLAGS = tf.app.flags.FLAGS
channel = grpc.insecure_channel(FLAGS.server)
stub = prediction_service_pb2_grpc.PredictionServiceStub(channel)
request = predict_pb2.PredictRequest()
request.model_spec.name = 'mvcnn'
request.model_spec.signature_name = 'serving_default’
input_array = get_next_input()
request.inputs['images'].CopyFrom(tf.contrib.util.make_tensor_proto(input_array,
dtype=tf.float32,shape=[1,80,128,128,1]))
result = stub.Predict(request, 30.0) # 30 secs timeout
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Results - MVCNN
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
INFERENCETIME(SECONDS)
HOURLY COST ($)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Results - VAE
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
INFERENCETIME(SECONDS)
HOURLY COST ($)
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.Autodesk, the Autodesk logo, and Revit are registered trademarks or trademarks of Autodesk, Inc., and/or its subsidiaries and/or affiliates in the USA and/or other countries. All other brand names, product names, or trademarks belong to their respective holders. Autodesk reserves the
right to alter product and services offerings, and specifications and pricing at any time without notice, and is not responsible for typographical or graphical errors that may appear in this document.
© 2018 Autodesk. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Summary
• EI accelerators available in a range of sizes suitable for inference workloads-
• Configure to launch with any EC2 instance type– scale capacity with autoscaling
groups.
• EI configuration is also available though CloudFormation as you configure your
instance resource.
• Deploy TensorFlow, MXNet and ONNX models with no code changes.
• Integrated with SageMaker for a fully managed experience
aws.amazon.com/machine-learning/elastic-inference/
Thank you!
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

More Related Content

What's hot

Advanced Deployment Best Practices with AWS CodeDeploy (DEV404-R2) - AWS re:I...
Advanced Deployment Best Practices with AWS CodeDeploy (DEV404-R2) - AWS re:I...Advanced Deployment Best Practices with AWS CodeDeploy (DEV404-R2) - AWS re:I...
Advanced Deployment Best Practices with AWS CodeDeploy (DEV404-R2) - AWS re:I...Amazon Web Services
 
Amazon EC2 Foundations (CMP208-R1) - AWS re:Invent 2018
Amazon EC2 Foundations (CMP208-R1) - AWS re:Invent 2018Amazon EC2 Foundations (CMP208-R1) - AWS re:Invent 2018
Amazon EC2 Foundations (CMP208-R1) - AWS re:Invent 2018Amazon Web Services
 
Overview of the New Amazon EC2 Instances with AMD EPYC (CMP385-R1) - AWS re:I...
Overview of the New Amazon EC2 Instances with AMD EPYC (CMP385-R1) - AWS re:I...Overview of the New Amazon EC2 Instances with AMD EPYC (CMP385-R1) - AWS re:I...
Overview of the New Amazon EC2 Instances with AMD EPYC (CMP385-R1) - AWS re:I...Amazon Web Services
 
Save up to 90% on Big Data and Machine Learning Workloads with Spot Instances...
Save up to 90% on Big Data and Machine Learning Workloads with Spot Instances...Save up to 90% on Big Data and Machine Learning Workloads with Spot Instances...
Save up to 90% on Big Data and Machine Learning Workloads with Spot Instances...Amazon Web Services
 
High Performance Computing on AWS: Driving Innovation without Infrastructure ...
High Performance Computing on AWS: Driving Innovation without Infrastructure ...High Performance Computing on AWS: Driving Innovation without Infrastructure ...
High Performance Computing on AWS: Driving Innovation without Infrastructure ...Amazon Web Services
 
建構全球跨區域 x Active-Active架構的無伺服器化後台服務
建構全球跨區域  x Active-Active架構的無伺服器化後台服務建構全球跨區域  x Active-Active架構的無伺服器化後台服務
建構全球跨區域 x Active-Active架構的無伺服器化後台服務Amazon Web Services
 
Earn Your DevOps Black Belt: Deployment Scenarios with AWS CloudFormation (DE...
Earn Your DevOps Black Belt: Deployment Scenarios with AWS CloudFormation (DE...Earn Your DevOps Black Belt: Deployment Scenarios with AWS CloudFormation (DE...
Earn Your DevOps Black Belt: Deployment Scenarios with AWS CloudFormation (DE...Amazon Web Services
 
Hands-On with Amazon ElastiCache for Redis - Workshop (DAT309-R1) - AWS re:In...
Hands-On with Amazon ElastiCache for Redis - Workshop (DAT309-R1) - AWS re:In...Hands-On with Amazon ElastiCache for Redis - Workshop (DAT309-R1) - AWS re:In...
Hands-On with Amazon ElastiCache for Redis - Workshop (DAT309-R1) - AWS re:In...Amazon Web Services
 
Work Anywhere with Amazon Workspaces (Level: 200)
Work Anywhere with Amazon Workspaces (Level: 200)Work Anywhere with Amazon Workspaces (Level: 200)
Work Anywhere with Amazon Workspaces (Level: 200)Amazon Web Services
 
[NEW LAUNCH!] Introducing Amazon Managed Streaming for Kafka (Amazon MSK) (AN...
[NEW LAUNCH!] Introducing Amazon Managed Streaming for Kafka (Amazon MSK) (AN...[NEW LAUNCH!] Introducing Amazon Managed Streaming for Kafka (Amazon MSK) (AN...
[NEW LAUNCH!] Introducing Amazon Managed Streaming for Kafka (Amazon MSK) (AN...Amazon Web Services
 
MySQL High Availability & Disaster Recovery (DAT361) - AWS re:Invent 2018
MySQL High Availability & Disaster Recovery (DAT361) - AWS re:Invent 2018MySQL High Availability & Disaster Recovery (DAT361) - AWS re:Invent 2018
MySQL High Availability & Disaster Recovery (DAT361) - AWS re:Invent 2018Amazon Web Services
 
Workshop: Architecting a Serverless Data Lake
Workshop: Architecting a Serverless Data LakeWorkshop: Architecting a Serverless Data Lake
Workshop: Architecting a Serverless Data LakeAmazon Web Services
 
Beyond the Basics: Advanced Infrastructure as Code Programming on AWS (DEV327...
Beyond the Basics: Advanced Infrastructure as Code Programming on AWS (DEV327...Beyond the Basics: Advanced Infrastructure as Code Programming on AWS (DEV327...
Beyond the Basics: Advanced Infrastructure as Code Programming on AWS (DEV327...Amazon Web Services
 
Best Practices for Scalable Monitoring (ENT310-S) - AWS re:Invent 2018
Best Practices for Scalable Monitoring (ENT310-S) - AWS re:Invent 2018Best Practices for Scalable Monitoring (ENT310-S) - AWS re:Invent 2018
Best Practices for Scalable Monitoring (ENT310-S) - AWS re:Invent 2018Amazon Web Services
 
Accelerate Your Analytic Queries with Amazon Aurora Parallel Query (DAT362) -...
Accelerate Your Analytic Queries with Amazon Aurora Parallel Query (DAT362) -...Accelerate Your Analytic Queries with Amazon Aurora Parallel Query (DAT362) -...
Accelerate Your Analytic Queries with Amazon Aurora Parallel Query (DAT362) -...Amazon Web Services
 
以 Amazon EC2 Spot 執行個體有效控制專案成本 (Level: 200)
以 Amazon EC2 Spot 執行個體有效控制專案成本 (Level: 200)以 Amazon EC2 Spot 執行個體有效控制專案成本 (Level: 200)
以 Amazon EC2 Spot 執行個體有效控制專案成本 (Level: 200)Amazon Web Services
 
AWS Greengrass, Containers, and Your Dev Process for Edge Apps (GPSWS404) - A...
AWS Greengrass, Containers, and Your Dev Process for Edge Apps (GPSWS404) - A...AWS Greengrass, Containers, and Your Dev Process for Edge Apps (GPSWS404) - A...
AWS Greengrass, Containers, and Your Dev Process for Edge Apps (GPSWS404) - A...Amazon Web Services
 
Automate Your Alexa Lambda Function Deployment Workflows Using AWS CodeCommit...
Automate Your Alexa Lambda Function Deployment Workflows Using AWS CodeCommit...Automate Your Alexa Lambda Function Deployment Workflows Using AWS CodeCommit...
Automate Your Alexa Lambda Function Deployment Workflows Using AWS CodeCommit...Amazon Web Services
 
Migrating Data to the Cloud: Exploring Your Options from AWS (STG205-R1) - AW...
Migrating Data to the Cloud: Exploring Your Options from AWS (STG205-R1) - AW...Migrating Data to the Cloud: Exploring Your Options from AWS (STG205-R1) - AW...
Migrating Data to the Cloud: Exploring Your Options from AWS (STG205-R1) - AW...Amazon Web Services
 

What's hot (20)

Advanced Deployment Best Practices with AWS CodeDeploy (DEV404-R2) - AWS re:I...
Advanced Deployment Best Practices with AWS CodeDeploy (DEV404-R2) - AWS re:I...Advanced Deployment Best Practices with AWS CodeDeploy (DEV404-R2) - AWS re:I...
Advanced Deployment Best Practices with AWS CodeDeploy (DEV404-R2) - AWS re:I...
 
Amazon EC2 Foundations (CMP208-R1) - AWS re:Invent 2018
Amazon EC2 Foundations (CMP208-R1) - AWS re:Invent 2018Amazon EC2 Foundations (CMP208-R1) - AWS re:Invent 2018
Amazon EC2 Foundations (CMP208-R1) - AWS re:Invent 2018
 
Overview of the New Amazon EC2 Instances with AMD EPYC (CMP385-R1) - AWS re:I...
Overview of the New Amazon EC2 Instances with AMD EPYC (CMP385-R1) - AWS re:I...Overview of the New Amazon EC2 Instances with AMD EPYC (CMP385-R1) - AWS re:I...
Overview of the New Amazon EC2 Instances with AMD EPYC (CMP385-R1) - AWS re:I...
 
Save up to 90% on Big Data and Machine Learning Workloads with Spot Instances...
Save up to 90% on Big Data and Machine Learning Workloads with Spot Instances...Save up to 90% on Big Data and Machine Learning Workloads with Spot Instances...
Save up to 90% on Big Data and Machine Learning Workloads with Spot Instances...
 
High Performance Computing on AWS: Driving Innovation without Infrastructure ...
High Performance Computing on AWS: Driving Innovation without Infrastructure ...High Performance Computing on AWS: Driving Innovation without Infrastructure ...
High Performance Computing on AWS: Driving Innovation without Infrastructure ...
 
建構全球跨區域 x Active-Active架構的無伺服器化後台服務
建構全球跨區域  x Active-Active架構的無伺服器化後台服務建構全球跨區域  x Active-Active架構的無伺服器化後台服務
建構全球跨區域 x Active-Active架構的無伺服器化後台服務
 
Earn Your DevOps Black Belt: Deployment Scenarios with AWS CloudFormation (DE...
Earn Your DevOps Black Belt: Deployment Scenarios with AWS CloudFormation (DE...Earn Your DevOps Black Belt: Deployment Scenarios with AWS CloudFormation (DE...
Earn Your DevOps Black Belt: Deployment Scenarios with AWS CloudFormation (DE...
 
Hands-On with Amazon ElastiCache for Redis - Workshop (DAT309-R1) - AWS re:In...
Hands-On with Amazon ElastiCache for Redis - Workshop (DAT309-R1) - AWS re:In...Hands-On with Amazon ElastiCache for Redis - Workshop (DAT309-R1) - AWS re:In...
Hands-On with Amazon ElastiCache for Redis - Workshop (DAT309-R1) - AWS re:In...
 
Work Anywhere with Amazon Workspaces (Level: 200)
Work Anywhere with Amazon Workspaces (Level: 200)Work Anywhere with Amazon Workspaces (Level: 200)
Work Anywhere with Amazon Workspaces (Level: 200)
 
Amazon Aurora 深度探討
Amazon Aurora 深度探討Amazon Aurora 深度探討
Amazon Aurora 深度探討
 
[NEW LAUNCH!] Introducing Amazon Managed Streaming for Kafka (Amazon MSK) (AN...
[NEW LAUNCH!] Introducing Amazon Managed Streaming for Kafka (Amazon MSK) (AN...[NEW LAUNCH!] Introducing Amazon Managed Streaming for Kafka (Amazon MSK) (AN...
[NEW LAUNCH!] Introducing Amazon Managed Streaming for Kafka (Amazon MSK) (AN...
 
MySQL High Availability & Disaster Recovery (DAT361) - AWS re:Invent 2018
MySQL High Availability & Disaster Recovery (DAT361) - AWS re:Invent 2018MySQL High Availability & Disaster Recovery (DAT361) - AWS re:Invent 2018
MySQL High Availability & Disaster Recovery (DAT361) - AWS re:Invent 2018
 
Workshop: Architecting a Serverless Data Lake
Workshop: Architecting a Serverless Data LakeWorkshop: Architecting a Serverless Data Lake
Workshop: Architecting a Serverless Data Lake
 
Beyond the Basics: Advanced Infrastructure as Code Programming on AWS (DEV327...
Beyond the Basics: Advanced Infrastructure as Code Programming on AWS (DEV327...Beyond the Basics: Advanced Infrastructure as Code Programming on AWS (DEV327...
Beyond the Basics: Advanced Infrastructure as Code Programming on AWS (DEV327...
 
Best Practices for Scalable Monitoring (ENT310-S) - AWS re:Invent 2018
Best Practices for Scalable Monitoring (ENT310-S) - AWS re:Invent 2018Best Practices for Scalable Monitoring (ENT310-S) - AWS re:Invent 2018
Best Practices for Scalable Monitoring (ENT310-S) - AWS re:Invent 2018
 
Accelerate Your Analytic Queries with Amazon Aurora Parallel Query (DAT362) -...
Accelerate Your Analytic Queries with Amazon Aurora Parallel Query (DAT362) -...Accelerate Your Analytic Queries with Amazon Aurora Parallel Query (DAT362) -...
Accelerate Your Analytic Queries with Amazon Aurora Parallel Query (DAT362) -...
 
以 Amazon EC2 Spot 執行個體有效控制專案成本 (Level: 200)
以 Amazon EC2 Spot 執行個體有效控制專案成本 (Level: 200)以 Amazon EC2 Spot 執行個體有效控制專案成本 (Level: 200)
以 Amazon EC2 Spot 執行個體有效控制專案成本 (Level: 200)
 
AWS Greengrass, Containers, and Your Dev Process for Edge Apps (GPSWS404) - A...
AWS Greengrass, Containers, and Your Dev Process for Edge Apps (GPSWS404) - A...AWS Greengrass, Containers, and Your Dev Process for Edge Apps (GPSWS404) - A...
AWS Greengrass, Containers, and Your Dev Process for Edge Apps (GPSWS404) - A...
 
Automate Your Alexa Lambda Function Deployment Workflows Using AWS CodeCommit...
Automate Your Alexa Lambda Function Deployment Workflows Using AWS CodeCommit...Automate Your Alexa Lambda Function Deployment Workflows Using AWS CodeCommit...
Automate Your Alexa Lambda Function Deployment Workflows Using AWS CodeCommit...
 
Migrating Data to the Cloud: Exploring Your Options from AWS (STG205-R1) - AW...
Migrating Data to the Cloud: Exploring Your Options from AWS (STG205-R1) - AW...Migrating Data to the Cloud: Exploring Your Options from AWS (STG205-R1) - AW...
Migrating Data to the Cloud: Exploring Your Options from AWS (STG205-R1) - AW...
 

Similar to [NEW LAUNCH!] Introducing Amazon Elastic Inference: Reduce Deep Learning Inference Cost up to 75% (AIM366) - AWS re:Invent 2018

Deep learning acceleration with Amazon Elastic Inference
Deep learning acceleration with Amazon Elastic Inference  Deep learning acceleration with Amazon Elastic Inference
Deep learning acceleration with Amazon Elastic Inference Hagay Lupesko
 
Build Deep Learning Applications Using Apache MXNet - Featuring Chick-fil-A (...
Build Deep Learning Applications Using Apache MXNet - Featuring Chick-fil-A (...Build Deep Learning Applications Using Apache MXNet - Featuring Chick-fil-A (...
Build Deep Learning Applications Using Apache MXNet - Featuring Chick-fil-A (...Amazon Web Services
 
Build Deep Learning Applications Using Apache MXNet, Featuring Workday (AIM40...
Build Deep Learning Applications Using Apache MXNet, Featuring Workday (AIM40...Build Deep Learning Applications Using Apache MXNet, Featuring Workday (AIM40...
Build Deep Learning Applications Using Apache MXNet, Featuring Workday (AIM40...Amazon Web Services
 
Building Deep Learning Applications with TensorFlow and SageMaker on AWS - Te...
Building Deep Learning Applications with TensorFlow and SageMaker on AWS - Te...Building Deep Learning Applications with TensorFlow and SageMaker on AWS - Te...
Building Deep Learning Applications with TensorFlow and SageMaker on AWS - Te...Amazon Web Services
 
[NEW LAUNCH!] Scaling Tightly-coupled HPC workloads on HPC with Elastic Fabri...
[NEW LAUNCH!] Scaling Tightly-coupled HPC workloads on HPC with Elastic Fabri...[NEW LAUNCH!] Scaling Tightly-coupled HPC workloads on HPC with Elastic Fabri...
[NEW LAUNCH!] Scaling Tightly-coupled HPC workloads on HPC with Elastic Fabri...Amazon Web Services
 
Introduction to Scalable Deep Learning on AWS with Apache MXNet
Introduction to Scalable Deep Learning on AWS with Apache MXNetIntroduction to Scalable Deep Learning on AWS with Apache MXNet
Introduction to Scalable Deep Learning on AWS with Apache MXNetAmazon Web Services
 
Work with Machine Learning in Amazon SageMaker - BDA203 - Atlanta AWS Summit
Work with Machine Learning in Amazon SageMaker - BDA203 - Atlanta AWS SummitWork with Machine Learning in Amazon SageMaker - BDA203 - Atlanta AWS Summit
Work with Machine Learning in Amazon SageMaker - BDA203 - Atlanta AWS SummitAmazon Web Services
 
Get Started with Deep Learning and Computer Vision Using AWS DeepLens (AIM316...
Get Started with Deep Learning and Computer Vision Using AWS DeepLens (AIM316...Get Started with Deep Learning and Computer Vision Using AWS DeepLens (AIM316...
Get Started with Deep Learning and Computer Vision Using AWS DeepLens (AIM316...Amazon Web Services
 
Machine Learning e Amazon SageMaker: Algoritmos, Modelos e Inferências - MCL...
Machine Learning e Amazon SageMaker: Algoritmos, Modelos e Inferências -  MCL...Machine Learning e Amazon SageMaker: Algoritmos, Modelos e Inferências -  MCL...
Machine Learning e Amazon SageMaker: Algoritmos, Modelos e Inferências - MCL...Amazon Web Services
 
BDA301 Working with Machine Learning in Amazon SageMaker: Algorithms, Models,...
BDA301 Working with Machine Learning in Amazon SageMaker: Algorithms, Models,...BDA301 Working with Machine Learning in Amazon SageMaker: Algorithms, Models,...
BDA301 Working with Machine Learning in Amazon SageMaker: Algorithms, Models,...Amazon Web Services
 
Distributed Model Training using MXNet with Horovod
Distributed Model Training using MXNet with HorovodDistributed Model Training using MXNet with Horovod
Distributed Model Training using MXNet with HorovodLin Yuan
 
Time series modeling workd AMLD 2018 Lausanne
Time series modeling workd AMLD 2018 LausanneTime series modeling workd AMLD 2018 Lausanne
Time series modeling workd AMLD 2018 LausanneSunil Mallya
 
Amazon AI/ML Overview
Amazon AI/ML OverviewAmazon AI/ML Overview
Amazon AI/ML OverviewBESPIN GLOBAL
 
ML Workflows with Amazon SageMaker and AWS Step Functions (API325) - AWS re:I...
ML Workflows with Amazon SageMaker and AWS Step Functions (API325) - AWS re:I...ML Workflows with Amazon SageMaker and AWS Step Functions (API325) - AWS re:I...
ML Workflows with Amazon SageMaker and AWS Step Functions (API325) - AWS re:I...Amazon Web Services
 
Build, train and deploy ML models with SageMaker (October 2019)
Build, train and deploy ML models with SageMaker (October 2019)Build, train and deploy ML models with SageMaker (October 2019)
Build, train and deploy ML models with SageMaker (October 2019)Julien SIMON
 
Supercharge your Machine Learning Solutions with Amazon SageMaker
Supercharge your Machine Learning Solutions with Amazon SageMakerSupercharge your Machine Learning Solutions with Amazon SageMaker
Supercharge your Machine Learning Solutions with Amazon SageMakerAmazon Web Services
 
Demystifying Machine Learning On AWS - AWS Summit Sydney 2018
Demystifying Machine Learning On AWS - AWS Summit Sydney 2018Demystifying Machine Learning On AWS - AWS Summit Sydney 2018
Demystifying Machine Learning On AWS - AWS Summit Sydney 2018Amazon Web Services
 
From Notebook to production with Amazon SageMaker
From Notebook to production with Amazon SageMakerFrom Notebook to production with Amazon SageMaker
From Notebook to production with Amazon SageMakerAmazon Web Services
 
Amazon SageMaker (December 2018)
Amazon SageMaker (December 2018)Amazon SageMaker (December 2018)
Amazon SageMaker (December 2018)Julien SIMON
 
Julien Simon, Principal Technical Evangelist at Amazon - Machine Learning: Fr...
Julien Simon, Principal Technical Evangelist at Amazon - Machine Learning: Fr...Julien Simon, Principal Technical Evangelist at Amazon - Machine Learning: Fr...
Julien Simon, Principal Technical Evangelist at Amazon - Machine Learning: Fr...Codiax
 

Similar to [NEW LAUNCH!] Introducing Amazon Elastic Inference: Reduce Deep Learning Inference Cost up to 75% (AIM366) - AWS re:Invent 2018 (20)

Deep learning acceleration with Amazon Elastic Inference
Deep learning acceleration with Amazon Elastic Inference  Deep learning acceleration with Amazon Elastic Inference
Deep learning acceleration with Amazon Elastic Inference
 
Build Deep Learning Applications Using Apache MXNet - Featuring Chick-fil-A (...
Build Deep Learning Applications Using Apache MXNet - Featuring Chick-fil-A (...Build Deep Learning Applications Using Apache MXNet - Featuring Chick-fil-A (...
Build Deep Learning Applications Using Apache MXNet - Featuring Chick-fil-A (...
 
Build Deep Learning Applications Using Apache MXNet, Featuring Workday (AIM40...
Build Deep Learning Applications Using Apache MXNet, Featuring Workday (AIM40...Build Deep Learning Applications Using Apache MXNet, Featuring Workday (AIM40...
Build Deep Learning Applications Using Apache MXNet, Featuring Workday (AIM40...
 
Building Deep Learning Applications with TensorFlow and SageMaker on AWS - Te...
Building Deep Learning Applications with TensorFlow and SageMaker on AWS - Te...Building Deep Learning Applications with TensorFlow and SageMaker on AWS - Te...
Building Deep Learning Applications with TensorFlow and SageMaker on AWS - Te...
 
[NEW LAUNCH!] Scaling Tightly-coupled HPC workloads on HPC with Elastic Fabri...
[NEW LAUNCH!] Scaling Tightly-coupled HPC workloads on HPC with Elastic Fabri...[NEW LAUNCH!] Scaling Tightly-coupled HPC workloads on HPC with Elastic Fabri...
[NEW LAUNCH!] Scaling Tightly-coupled HPC workloads on HPC with Elastic Fabri...
 
Introduction to Scalable Deep Learning on AWS with Apache MXNet
Introduction to Scalable Deep Learning on AWS with Apache MXNetIntroduction to Scalable Deep Learning on AWS with Apache MXNet
Introduction to Scalable Deep Learning on AWS with Apache MXNet
 
Work with Machine Learning in Amazon SageMaker - BDA203 - Atlanta AWS Summit
Work with Machine Learning in Amazon SageMaker - BDA203 - Atlanta AWS SummitWork with Machine Learning in Amazon SageMaker - BDA203 - Atlanta AWS Summit
Work with Machine Learning in Amazon SageMaker - BDA203 - Atlanta AWS Summit
 
Get Started with Deep Learning and Computer Vision Using AWS DeepLens (AIM316...
Get Started with Deep Learning and Computer Vision Using AWS DeepLens (AIM316...Get Started with Deep Learning and Computer Vision Using AWS DeepLens (AIM316...
Get Started with Deep Learning and Computer Vision Using AWS DeepLens (AIM316...
 
Machine Learning e Amazon SageMaker: Algoritmos, Modelos e Inferências - MCL...
Machine Learning e Amazon SageMaker: Algoritmos, Modelos e Inferências -  MCL...Machine Learning e Amazon SageMaker: Algoritmos, Modelos e Inferências -  MCL...
Machine Learning e Amazon SageMaker: Algoritmos, Modelos e Inferências - MCL...
 
BDA301 Working with Machine Learning in Amazon SageMaker: Algorithms, Models,...
BDA301 Working with Machine Learning in Amazon SageMaker: Algorithms, Models,...BDA301 Working with Machine Learning in Amazon SageMaker: Algorithms, Models,...
BDA301 Working with Machine Learning in Amazon SageMaker: Algorithms, Models,...
 
Distributed Model Training using MXNet with Horovod
Distributed Model Training using MXNet with HorovodDistributed Model Training using MXNet with Horovod
Distributed Model Training using MXNet with Horovod
 
Time series modeling workd AMLD 2018 Lausanne
Time series modeling workd AMLD 2018 LausanneTime series modeling workd AMLD 2018 Lausanne
Time series modeling workd AMLD 2018 Lausanne
 
Amazon AI/ML Overview
Amazon AI/ML OverviewAmazon AI/ML Overview
Amazon AI/ML Overview
 
ML Workflows with Amazon SageMaker and AWS Step Functions (API325) - AWS re:I...
ML Workflows with Amazon SageMaker and AWS Step Functions (API325) - AWS re:I...ML Workflows with Amazon SageMaker and AWS Step Functions (API325) - AWS re:I...
ML Workflows with Amazon SageMaker and AWS Step Functions (API325) - AWS re:I...
 
Build, train and deploy ML models with SageMaker (October 2019)
Build, train and deploy ML models with SageMaker (October 2019)Build, train and deploy ML models with SageMaker (October 2019)
Build, train and deploy ML models with SageMaker (October 2019)
 
Supercharge your Machine Learning Solutions with Amazon SageMaker
Supercharge your Machine Learning Solutions with Amazon SageMakerSupercharge your Machine Learning Solutions with Amazon SageMaker
Supercharge your Machine Learning Solutions with Amazon SageMaker
 
Demystifying Machine Learning On AWS - AWS Summit Sydney 2018
Demystifying Machine Learning On AWS - AWS Summit Sydney 2018Demystifying Machine Learning On AWS - AWS Summit Sydney 2018
Demystifying Machine Learning On AWS - AWS Summit Sydney 2018
 
From Notebook to production with Amazon SageMaker
From Notebook to production with Amazon SageMakerFrom Notebook to production with Amazon SageMaker
From Notebook to production with Amazon SageMaker
 
Amazon SageMaker (December 2018)
Amazon SageMaker (December 2018)Amazon SageMaker (December 2018)
Amazon SageMaker (December 2018)
 
Julien Simon, Principal Technical Evangelist at Amazon - Machine Learning: Fr...
Julien Simon, Principal Technical Evangelist at Amazon - Machine Learning: Fr...Julien Simon, Principal Technical Evangelist at Amazon - Machine Learning: Fr...
Julien Simon, Principal Technical Evangelist at Amazon - Machine Learning: Fr...
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

[NEW LAUNCH!] Introducing Amazon Elastic Inference: Reduce Deep Learning Inference Cost up to 75% (AIM366) - AWS re:Invent 2018

  • 1. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Elastic Inference – Reduce Deep Learning inference costs by 75% A I M 3 6 6 Dominic Divakaruni Sr. Product Manager Sudipta Sengupta Sr. Principal Technologist AWS – Machine Learning Peter Jones Head of AI Engineering Liviu Calin AI Systems Engineer Autodesk AI Lab
  • 2. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Agenda ❖ Challenges scaling deep learning applications ❖ Our solution that addresses the cost efficiency and flexibility challenges. ❖ Share Autodesk’s experience
  • 3. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Machine learning – the centerpiece for transformation Customer experience Business operations Decision- making Innovation Competitive advantage
  • 4. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Inference (Prediction) 90% Training 10%
  • 5. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. The challenges of inference in production
  • 6. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. A closer look at GPU utilization for inference 0 100 200 300 400 500 600 700 800 900 1000 1 2 3 4 5 6 7 90% underutilized for single batch size inference
  • 7. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. 0 50 100 150 200 1 2 3 4 5 6 More sessions doesn’t solve the problem
  • 8. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. How cost effective are GPU instances for inference? Smaller P2 instances are more effective for real time inference with small batch sizes © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. How cost effective are GPU instancesfor inference?
  • 9. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. How do we optimize resources and reduce costs?
  • 10. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. How do we optimize resources and reduce costs? Introducing
  • 11. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Elastic Inference Integrated with Amazon EC2 and Amazon SageMaker Support for TensorFlow, Apache MXNet, and ONNX with PyTorch coming soon Single and mixed-precision operations Reduce deep learning inference costs up to 75%
  • 12. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Acceleration sizes tailored for inference Accelerator Type FP32 Throughput (TOPS) FP16 Throughput (TOPS) Accelerator Memory (GB) Price ($/hr) (US) eia1.medium 1 8 1 $0.13 eia1.large 2 16 2 $0.26 eia1.xlarge 4 32 4 $0.52 Now available in N. Virginia, Ohio, Oregon, Dublin, Tokyo, and Seoul
  • 13. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Inference Performance with EI and GPU 0 20 40 60 80 100 120 0 10 20 30 40 50 60 70 0 20 40 60 80 100 120 140
  • 14. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. How does Elastic Inference work with Amazon EC2? VPC Region Availability Zone
  • 15. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Scale capacity in EC2 Auto Scaling groups Auto Scaling group
  • 16. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. How does Elastic Inference work with SageMaker? SageMaker Notebooks SageMaker Hosted Endpoints
  • 17. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Model Support ONNX Amazon EI enabled TensorFlow Serving Amazon EI enabled Apache MXNet Applied using Apache MXNet
  • 18. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Loading models and serving requests AmazonEI_TensorFlow_Serving_v1.11_v1 --model_name=inception -- model_base_path=[model location] --port=9000 python inception_client.py --server=localhost:9000 --image Siberian_Husky_bi-eyed_Flickr.jpg TensorFlow models using Amazon EI enabled TensorFlow serving
  • 19. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Loading models and serving requests Load MXNet models using Amazon EI enabled Apache MXNet # For ONNX models use MXNet’s import.model API as follows: sym, arg, aux = onnx_mxnet.import_model(onnx_model_file) # Pass mx.eia() as context while creating Module object mod = mx.mod.Module(symbol=sym, context=mx.eia()) Load ONNX models using Amazon EI enabled Apache MXNet # example for MXNet models sym, arg_params, aux_params = mx.model.load_checkpoint('resnet-152', 0) mod = mx.mod.Module(symbol=sym, context=mx.eia(), label_names=None) mod.bind(for_training=False, data_shapes=[('data', (1,3,224,224))], label_shapes=mod._label_shapes) mod.set_params(arg_params, aux_params, allow_missing=True)
  • 20. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. How to choose? Considerations as you choose an instance and accelerator type combination for your model: ➢ What is your target latency SLA for your application, and what are you constraints? ➢ Start small and size up if you need more capacity. ➢ Input/output data payload has an impact on latency. ➢ Convert to Fp16 for lower latency and higher throughput.
  • 21. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Peter Jones Head of AI Engineering Autodesk AI Lab Liviu-Mihai Calin AI Systems Engineer Autodesk AI Lab
  • 22. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. MORE IS INEVITABLE
  • 23. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. LESS IS A REALITY
  • 24. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Image courtesy of Tesla Motors, Inc. Image courtesy of Gensler. The Martian © 2015 Twentieth Century Fox. All rights reserved.
  • 25. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. OPPORTUNITY OF BETTER
  • 26. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 27. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 28. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 29. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. AI LAB
  • 30. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 31. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 32. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Softmax Classifier Embedding 2d-conv 2d-conv 2d-conv 2d-conv 2d-conv batch-max dense dense dense Multi-view Convolutional Neural Network MVCNN
  • 33. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Variational Autoencoder (VAE)
  • 34. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Instance Setup with Elastic Inference aws ec2 run-instances --image-id <preconfigured_ami_id> --instance-type <ec2_instance_type> --key-name <key_name> --subnet-id <subnet_id> --security-group-ids <security_group_id --iam-instance-profile Name=”iam_profile_name” --elastic-inference-accelerator Type=eia1.<size> • Just like setting up a normal EC2 instance • Create instance with preconfigured AMI and reference to accelerator • A VPC endpoint to allow EC2 instance to connect to accelerator (done once)
  • 35. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Using Elastic Inference • Serve saved model with EI version of TensorFlow model server • Send requests to the server to predict with test data • Elastic inference takes care of accelerating the operations
  • 36. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Creating a Saved Model classifier = tf.estimator.Estimator(…) input_tensor = tf.placeholder(dtype=tf.float32, shape=[1, 80, 128, 128, 1], name='images_tensor’) input_map = {'images’ : input_tensor} classifier.export_savedmodel(model_dir, tf.estimator.export.build_raw_serving_input_receiver_fn(input_map)) • The MVCNN model is in TF Estimator format and has been trained • It expects grayscale multi-view images named “images” as input • Dimensions: [batch_size , num_views , width , height , color_channels]
  • 37. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Predicting with EI TensorFlow Serving AmazonEI_TensorFlow_Serving_v1.11_v1 --model_name=mvcnn --model_base_path=model_dir --port=9000 • Have one process serve the previously exported saved model • Have another process send requests containing input data
  • 38. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Predicting with EI TensorFlow Serving tf.app.flags.DEFINE_string('server', 'localhost:9000', 'PredictionService host:port’) FLAGS = tf.app.flags.FLAGS channel = grpc.insecure_channel(FLAGS.server) stub = prediction_service_pb2_grpc.PredictionServiceStub(channel) request = predict_pb2.PredictRequest() request.model_spec.name = 'mvcnn' request.model_spec.signature_name = 'serving_default’ input_array = get_next_input() request.inputs['images'].CopyFrom(tf.contrib.util.make_tensor_proto(input_array, dtype=tf.float32,shape=[1,80,128,128,1])) result = stub.Predict(request, 30.0) # 30 secs timeout
  • 39. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 40. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Results - MVCNN 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 INFERENCETIME(SECONDS) HOURLY COST ($)
  • 41. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Results - VAE 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 INFERENCETIME(SECONDS) HOURLY COST ($)
  • 42. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.Autodesk, the Autodesk logo, and Revit are registered trademarks or trademarks of Autodesk, Inc., and/or its subsidiaries and/or affiliates in the USA and/or other countries. All other brand names, product names, or trademarks belong to their respective holders. Autodesk reserves the right to alter product and services offerings, and specifications and pricing at any time without notice, and is not responsible for typographical or graphical errors that may appear in this document. © 2018 Autodesk. All rights reserved.
  • 43. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Summary • EI accelerators available in a range of sizes suitable for inference workloads- • Configure to launch with any EC2 instance type– scale capacity with autoscaling groups. • EI configuration is also available though CloudFormation as you configure your instance resource. • Deploy TensorFlow, MXNet and ONNX models with no code changes. • Integrated with SageMaker for a fully managed experience aws.amazon.com/machine-learning/elastic-inference/
  • 44. Thank you! © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  • 45. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.