2. 2
Strategies for Developing ML Solutions on AWS
Polly – Text to Speech
Rekognition – Computer Vision
Comprehend – Natural Language Processing
Lex – Chatbots
AWS AI/ML Services
01 Pretrained Models on
AWS Market place
02
A collection of:
- Pre-trained models
- Solution Templates
03 - EMR Jupyter or Zepplin Notebooks
- Notebooks with Glue Dev endpoint
- Cloud 9 development
- Sagemaker Jupyter Notebooks
- Sagemaker Studio
04
Subscribe to pre-trained models and deploy them
to Sagemaker from web console
Accelerated ML development with
Sagemaker JumpStart
Dev Enviroments for .Py and .Ipynb files
3. 3
What is Sagemaker?
Sagemaker is a fully-managed AWS Service that facilitates almost every part of the ML
development lifecycle.
Data
• Sagemaker Groundtruth - automated data labelling
• Sagemaker Data Wrangler – data preprocessing pipelines
• Sagemaker Clarify – bias detection
Training
• Sagemaker Experiments – experiments tracking
• Sagemaker Autopilot – auto ml
• Sagemaker Feature Store – feature store
Inference
• Sagemaker Monitor – detect drift in models
• Sagemaker Elastic Inference – lower cost inference acceleration
• Sagemaker Neo – compile model for multiple hardware platforms
Full list of Features: https://docs.aws.amazon.com/sagemaker/latest/dg/whatis.html
4. 4
My Personal Challenges with Sagemaker
1. Poor Documentation
2. APIs with a deluge of Parameters
3. Docker containers are king and debugging them are difficult
5. 5
Ways to interact with Sagemaker resources
1 AWS Console – Web Interface
2 AWS CLI – command line
3 AWS SDK for python (Boto3) – Low level API
4 Sagemaker Python SDK – High level API
6. 6
Sagemaker Development Environments
01 Sagemaker Jupyter Notebooks
Similar to Google Colab but you have to manually create your
notebook instance
02
Sagemaker Studio
Jupyter Lab type of experience that provides full access to
Sagemaker’s resources
7. 7
Jupyter Notebooks Disadvantages
• They are slow to startup: 5 – 10 times slower than Sagemaker Studio Notebooks
• Lacks the integrated Notebook sharing features present in Sagemaker Notebooks
• Development environment has a fixed instance type
o You can switch the instance type on which your notebook should run in Sagemaker Studio
o Better cost savings on with Sagemaker Studio
9. 9
Basic workflow with a Sagemaker Studio Notebook
JupyterServerApp
created from Jupyter
notebooks
Infrastructure:
(EFS, Jupyter Server)
KernelGatewayApp
created from Docker
Images
Infrastructure:
(ml.t3.medium)
CreateTrainingJob
Infrastructure: (ml.c5.xlarge)
Source: https://docs.aws.amazon.com/sagemaker/latest/dg/notebooks.html
10. 10
Sagemaker Instance Types
• ml.t3.medium (Free-Tier Eligible) >> Fast
launch
• ml.m5.large >> Fast launch
General purpose
(no GPUs)
1
Compute
Optimized (No
GPUs)
2 • ml.c5.large >> Fast launch
Accelerated
computing (1+
GPUs)
3 • ml.g4dn.xlarge >> Fast launch
Memory optimized
(no GPUs)
4 • ml.r5.large
Full list can be found here: https://docs.aws.amazon.com/sagemaker/latest/dg/notebooks-available-
instance-types.html
Fast Launch:
Optimized to launch
in under 2 minutes
12. 12
Sagemaker Algorithms: JumpStart
Source: https://docs.aws.amazon.com/sagemaker/latest/dg/notebooks.html
Pre-trained Models and Solutions Templates
• Pretrained models and frameworks are stored as docker images
• Model API – Load pretrained model
• Prepocessor API – Run preprocessing jobs
• Transform API – Run Batch transform jobs
• Predictor API – Supply class to model to make real-time predictions using sagemaker endpoints
• Estimator API – Train or Finetune pre-trained Model
Docker Images: https://sagemaker.readthedocs.io/en/stable/doc_utils/pretrainedmodels.html
16. 16
Frameworks Example: Xgboost
Source: https://sagemaker.readthedocs.io/en/v2.38.0/frameworks/index.html
Use as a Framework: custom logic is defined in script-mode
17. 17
Frameworks Example: Xgboost
Source: https://sagemaker.readthedocs.io/en/v2.38.0/frameworks/index.html
Use a builtin algorithm: retrieved from docker images
Estimator used here
is the base sagemaker
estimator
Some features are accessible directly for the AWS web console
While others are accessible programmatically
Some features are accessible directly for the AWS web console
While others are accessible programmatically
Switch to Deployed Jupyter Notebook
Advantages of Studio over Jupyter notebooks
Easy to spin up and shut down training instances
Cost savings because you can decide on the type of instance you want
Collaboration between user profiles because it makes it easier to share notebooks than using Github
****** Switch to Show Domain Creation ************
- You create a domain using AWS IAM identity or AWS IAM to authenticate
- Afterwards, you create user profiles: this corresponds to a single user with a unique home directory in the EFS
****** Switch to Show Notebook Creation from docker images *****
Show notebook creation from docker images
Show terminals: System terminal and Docker Image terminal
Show JumpStart
Algorithms are already precoded
Supply your data to these Algorithms
***** Link to common properties of algorithms ********
Algorithms are already precoded
Supply your data to these Algorithms
Algorithms are already precoded
Supply your data to these Algorithms