Demystifying Machine Learning with AWS (ACD Mumbai)

Demystifying Machine
Learning on AWS
Jayesh Bapu Ahire
@Jayesh_Ahire1

Agenda:
1. Machine Learning 101
2. Why ML in cloud?
3. AI & ML capabilities of AWS
4. Exploring Amazon Sagemaker
5. Demo

Machine Learning 101
Training Data Training Model

Machine Learning End to End Pipeline
Source: https://docs.aws.amazon.com/sagemaker/latest/dg/how-it-works-mlconcepts.html

OBJECTIVE:
Democratize AI
by Making it accessible, fast and
useful for enterprise and
developers.

PROBLEMS:
● Infrastructure
○ Management
○ Hybrid
● Skilled talent
● Ease of setup

Machine Learning in Cloud
● The cloud’s pay-per-use model
● Easy for enterprises to experiment with ML capabilities and scale
up as projects go into production and demand increases.
● The cloud makes intelligent capabilities accessible without
requiring advanced skills in artificial intelligence or data science.
● AWS, Microsoft Azure, and Google Cloud Platform offer many
machine learning options that don’t require deep knowledge of AI,
machine learning theory, or a team of data scientists.

AI & ML capabilities of AWS
ML Frameworks + Infrastructure ML Services AI Services
Frameworks
Interfaces
+
Infrastructure
Amazon Sagemaker
Build
+
Train
+
Deploy
Personalize Forecast Rekognition
Comprehend Textract Polly
Lex Translate Transcribe

Scenario 1:
1. You’re a developer with very less or
no knowledge of ML Looking to
integrate some sort of AI capabilities
in your application.
2. You have very general use case.

Scenario 2:
You’re a developer or data scientist and
you want the ability to build, train, and
deploy machine learning models quickly
without a hassle of choosing
frameworks, interfaces and configuring
infrastructure.

Let’s explore more about
Amazon Sagemaker

What is Sagemaker & What it provides?
● Fully managed machine learning service.
● Quickly and easily build and train machine learning models, and then
directly deploy them into a production-ready hosted environment.
● Integrated Jupyter authoring notebook instance
● Common machine learning algorithms that are optimized to run efficiently
against extremely large data in a distributed environment.
● Bring-your-own-algorithms and frameworks
● Flexible distributed training options that adjust to your specific workflows.

Amazon Sagemaker Neo
Train once, run anywhere

Build Train Deploy
Collect & prepare training data
Data labeling & pre-built
notebooks for common
problems
Set up & manage environments
for training
One-click training using Amazon
EC2 On-Demand or Spot
instances
Deploy model in production
One-click deployment
Choose & optimize your ML
algorithm
Built-in, high-performance
algorithms and hundreds of
ready to use algorithms in AWS
Marketplace
Train & tune model
Train once, run anywhere &
model optimization
Scale & manage the production
environment
Fully managed with auto-scaling
for 75% less

Amazon SageMaker: Open Source Containers
● Customize them
● Run them locally for development and testing
● Run them on SageMaker for training and prediction at scale
https://github.com/aws/sagemaker-tensorflow-containers
https://github.com/aws/sagemaker-mxnet-containers

Amazon SageMaker: Bring Your Own Container
● Prepare the training code in Docker container
● Upload container image to Amazon Elastic Container Registry (ECR)
● Upload training dataset to Amazon S3/FSx/EFS
● Invoke Create Training Job API to execute a SageMaker training job
SageMaker training job pulls the container image from Amazon ECR, reads
the training data from the data source, configures the training job with
hyperparameter inputs, trains a model, and saves the model to model_dir so
that it can be deployed for inference later.
https://github.com/aws/sagemaker-container-support

Distributed Training At Scale on Amazon SageMaker
● Training on Amazon SageMaker can automatically distribute processing
across a number of nodes - including P3 instances
● You can choose from two data distribution types for training ML models
○ Fully Replicated - This will pass every file in the input to every
machine
○ Sharded S3 Key - This will separate and distribute the files in the
input across the training nodes
Overall, sharding can run faster but it depends on the algorithm

Amazon SageMaker: Local Mode Training
Enabling experimentation speed
● Train with local notebooks
● Train on notebook instances
● Iterate faster a small sample of the dataset locally no waiting for a new
● training cluster to be built each time
● Emulate CPU (single and multi-instance) and GPU (single instance) in local
mode
● Go distributed with a single line of code

Automatic Model Tuning on Amazon SageMaker
Hyperparameter Optimizer
● Amazon SageMaker automatic model tuning predicts hyperparameter
values, which might be most effective at improving fit.
● Automatic model tuning can be used with the Amazon SageMaker
○ Built-in algorithms,
○ Pre-built deep learning frameworks, and
○ Bring-your-own-algorithm containers
http://github.com/awslabs/amazon-sagemakerexamples/tree/master/hyperparameter tuning

Amazon SageMaker: Accelerating ML Training
Faster start times and training job execution time
● Two modes: File Mode and Pipe Mode
○ input mode parameter in sagemaker.estimator.estimator
● File Mode: S3 data source or file system data source
○ When using S3 as data source, training data set is downloaded to EBS volumes
○ Use file system data source (Amazon EFS or Amazon FSx for Lustre) for faster
training
○ startup and execution time. Different data formats supported: CSV, protobuf, JSON,
libsvm (check algo docs!)
● Pipe Mode streams the data set to training instances
○ This allows you to process large data sets and training starts faster
○ Dataset must be in recordio-encoded protobuf or csv format

Amazon SageMaker: Fully-Managed Spot Training
Reduce training costs at scale
● Managed Spot training on SageMaker to reduce training costs by up to 90%
● Managed Spot Training is available in all training configurations:
○ All instance types supported by Amazon SageMaker
○ All models: built-in algorithms, built-in frameworks, and custom models
○ All configurations: single instance training, distributed training, and
automatic model tuning.
● Setting it up is extremely simple
○ If you're using the console, just switch the feature on.
○ If you're working with the Amazon SageMaker SDK just set
train_use_spot_instances to true in the Estimator constructor.

Amazon SageMaker: Secure Machine Learning
● No retention of customers data
● SageMaker provides encryption in transit
● Encryption at rest everywhere
● Compute isolation - instances allocated for computation are never shared with
others
● Network isolation: all compute instances run inside private service managed
VPCs
● Secure, fully managed infrastructure: Amazon Sagemaker take care of patching
and keeping instances up-to-date
● Notebook security - Jupyter notebooks can be operated without internet access
and bound to secure customer VPCs

How To Train a Model With Amazon SageMaker
To train a model in Amazon SageMaker, you create a training job. The training job
includes the following information:
● The URL of the Amazon Simple Storage Service (Amazon S3) bucket or the file
● system id of the file system where you've stored the training data.
● The compute resources that you want Amazon SageMaker to use for model
training. Compute resources are ML compute instances that are managed by
Amazon SageMaker.
● The URL of the S3 bucket where you want to store the output of the job.
● The Amazon Elastic Container Registry path where the training code is stored.

Amazon SageMaker Training: Getting Started
To train a model in Amazon SageMaker, you will need the following:
● A dataset. Here we will use the MNIST (Modified National Institute of Standards and
Technology database) dataset. This dataset provides a training set of 50,000 example
images of handwritten single-digit numbers, a validation set of 10,000 images, and a test
dataset of 10,000 images.
● An algorithm. Here we will use the Linear Learner algorithm provided by Amazon
● An Amazon Simple Storage Service (Amazon S3) bucket to store the training data and the
model artifacts
● An Amazon SageMaker notebook instance to prepare and process data and to train and
deploy a machine learning model.
● A Jupyter notebook to use with the notebook instance
● For model training, deployment, and validation, I will use the high-level Amazon
SageMaker Python SDK

Amazon SageMaker Training: Getting Started
● Create the S3 bucket
● Create an Amazon SageMaker Notebook instance by going here:
https://console.aws.amazon.com/sagemaker/
● Choose Notebook instances, then choose Create notebook instance.
● On the Create notebook instance page, provide the Notebook instance name,
choose ml.t2.medium for instance type (least expensive instance) For IAM role,
choose Create a new role, then choose Create role.
● Choose Create notebook instance.
In a few minutes, Amazon SageMaker launches an ML compute instance
and attaches an ML storage volume to it. The notebook instance has a
preconfigured Jupyter notebook server and a set of Anaconda libraries.

Linear Learner with MNIST dataset example
● Provide the S3 bucket and prefix that you want to use for training and model
artifacts. This should be within the same region as the Notebook instance,
training, and hosting
● The IAM role arn used to give training and hosting access to your data
● Download the MNIST dataset
● Amazon SageMaker implementation of Linear Learner takes recordio wrapped
protobuf, where as the data we have is a pickle-ized numpy array on disk.
● This data conversion will be handled by the Amazon SageMaker Python SDK,
imported as sagemaker

Train the model
Create and Run a Training Job with Amazon SageMaker Python SDK
● To train a model in Amazon Sagemaker, you can use
○ Amazon SageMaker Python SDK or
○ AWS SDK for Python (Boto 3) or
○ AWS console
● For this exercise, I will use the notebook instance and the Python SDK
● The Amazon SageMaker Python SDK includes the
sagemaker.estimator.Estimator estimator, which can be used with any
algorithm.
● To run a model training job import the Amazon SageMaker Python SDK and get
the Linear Learner container

Scenario 3:
You’re a Machine Learning Expert and
want to develop your own pipeline on
high class infrastructure provided by
AWS.

Machine Learning end to end pipeline using AWS
Build
1. Pre-build algorithms
& notebooks
2. Data Labeling:
Ground Truth
3. AWS marketplace for
ML
Deploy
1. one-click deployment
and hosting
Train
1. One-click model
training and tuning
2. Sagemaker Neo
3. Sagemaker RL
03
01 02

Take free ML on AWS
course on Coursera

Demystifying Machine Learning with AWS (ACD Mumbai)

Recommended

Recommended

More Related Content

Similar to Demystifying Machine Learning with AWS (ACD Mumbai)

Similar to Demystifying Machine Learning with AWS (ACD Mumbai) (20)

Recently uploaded

Recently uploaded (20)

Demystifying Machine Learning with AWS (ACD Mumbai)