Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Costruisci modelli di Machine Learning con Amazon SageMaker Autopilot

1,097 views

Published on

Amazon SageMaker AutoPilot è una funzionalità di Amazon SageMaker che crea automaticamente il miglior modello di apprendimento automatico per il tuo set di dati. Con SageMaker Autopilot, si fornisce un set di dati tabellare e si seleziona la variabile target da prevedere, che può essere numerica o categorica. SageMaker Autopilot esplorerà automaticamente diverse soluzioni per trovare il modello migliore. È quindi possibile distribuire direttamente il modello in produzione con un solo clic o esplorare le soluzioni consigliate con Amazon SageMaker Studio per migliorare ulteriormente la qualità del modello. In questo webinar approfondiremo questa capacità, con dimostrazioni pratiche su come utilizzare il servizio.

  • Be the first to comment

Costruisci modelli di Machine Learning con Amazon SageMaker Autopilot

  1. 1. Amazon SageMaker Autopilot Giuseppe Angelo Porcelli Principal, Specialist Solutions Architect – Machine Learning Build models automatically with Amazon SageMaker
  2. 2. © 2020, Amazon Web Services, Inc. or its Affiliates. Agenda • Amazon SageMaker Overview • Amazon SageMaker Autopilot • Demo • Automatic Model Tuning in Amazon SageMaker • Resources to get you started
  3. 3. © 2020, Amazon Web Services, Inc. or its Affiliates. THE ML WORKFLOW IS ITERATIVE AND COMPLEX Collect and prepare training data Choose or bring your own ML algorithm Set up and manage environments for training Train, debug, and tune models Manage training runs Deploy model in production Monitor models Validate predictions Scale and manage the production environment
  4. 4. © 2020, Amazon Web Services, Inc. or its Affiliates. AMAZON SAGEMAKER SageMaker Ground Truth Fully managed data labeling SageMaker Studio Notebooks One-click notebooks with elastic compute One-click Training Makes it easy to run training jobs Automatic Model Tuning One-click hyperparameter optimization One-click Deployment Supports real-time, batch and multi-model Model Monitor Automatically detect concept drift SageMaker Processing Built-in Python, BYO R/Spark Built-in & bring- your-own algorithms Supervised and unsupervised algorithms SageMaker Experiments Capture, organize, and compare every step SageMaker Neo Train once, deploy anywhere AWS Marketplace Pre-built algorithms and models SageMaker Debugger Debug training runs Inf1/Amazon Elastic Inference High performance at lowest cost Managed Spot training Reduce training cost by 90% Amazon Augmented AI Add human review of model predictions SageMaker Studio SageMaker Autopilot Automatically build and train models
  5. 5. © 2020, Amazon Web Services, Inc. or its Affiliates. AutoML • AutoML aims at automating the process of building a model • Problem identification: looking at the data set, what class of problem are we trying to solve? • Algorithm selection: which algorithm is best suited to solve the problem? • Data preprocessing: how should data be prepared for best results? • Hyperparameter tuning: what is the optimal set of training parameters? • Black box vs. white box • Black box: the best model only à Hard to understand the model, impossible to reproduce it manually • White box: the best model, other candidates, full source code for preprocessing and training à See how the model was built, and keep tweaking for extra performance
  6. 6. © 2020, Amazon Web Services, Inc. or its Affiliates. AutoML with SageMaker Autopilot • SageMaker Autopilot covers all steps • Problem identification: looking at the data set, what class of problem are we trying to solve? • Algorithm selection: which algorithm is best suited to solve the problem? • Data preprocessing: how should data be prepared for best results? • Hyperparameter tuning: what is the optimal set of training parameters? • Autopilot is white box AutoML • You can understand how the model was built, and you can keep tweaking • Supported algorithms: regression and classification • Linear Learner • Factorization Machines • KNN • XGBoost
  7. 7. © 2020, Amazon Web Services, Inc. or its Affiliates. AutoML with SageMaker Autopilot 1. Upload the unprocessed dataset to S3 2. Configure the AutoML job • Location of dataset • Completion criteria 3. Launch the job 4. View the list of candidates and the autogenerated notebooks 5. Deploy the best candidate to a real-time endpoint, or use batch transform
  8. 8. © 2020, Amazon Web Services, Inc. or its Affiliates. AutoML with SageMaker Autopilot Dataset, target attribute Models Notebooks Metrics
  9. 9. © 2020, Amazon Web Services, Inc. or its Affiliates. SageMaker Autopilot from Studio Only S3 location and target variable required Optional control points: • dry-run vs complete mode • setting problem type • security settings API/SDK level control points: • number of candidate models to build • maximum time to take • model evaluation metric (accuracy, F1, RMSE)
  10. 10. © 2020, Amazon Web Services, Inc. or its Affiliates. SageMaker Autopilot whitebox AutoML Dataset Exploration Notebook • Dataset statistics: row-wise and column- wise • Suggested remedies for common data set problems
  11. 11. © 2020, Amazon Web Services, Inc. or its Affiliates. SageMaker Autopilot whitebox AutoML Model candidate notebook (fully runnable) • data transformers • featurization techniques applied • override points • algorithms considered • evaluation metric • hyper-parameter ranges • model search strategy • instances used
  12. 12. © 2020, Amazon Web Services, Inc. or its Affiliates. Recent Enhancements • Supporting AUC as an optimization metric: you can now direct Autopilot to use AUC for tuning in binary classification problems, resulting in higher performing models • Confidence scores on predictions: for classification problems, inference now returns the probability scores of each class • Minimum dataset size: reduced from 1000 to 500 rows
  13. 13. © 2020, Amazon Web Services, Inc. or its Affiliates. Demo
  14. 14. © 2020, Amazon Web Services, Inc. or its Affiliates. Resources on Amazon SageMaker Autopilot Documentation https://docs.aws.amazon.com/sagemaker/latest/dg/autopilot-automate-model-development.html https://sagemaker.readthedocs.io/en/stable/automl.html Notebooks https://github.com/awslabs/amazon-sagemaker-examples/tree/master/autopilot Blog posts https://aws.amazon.com/blogs/aws/amazon-sagemaker-autopilot-fully-managed-automatic-machine-learning/ Tutorials https://aws.amazon.com/getting-started/hands-on/create-machine-learning-model-automatically-sagemaker- autopilot/
  15. 15. © 2020, Amazon Web Services, Inc. or its Affiliates. Automatic model tuning in Amazon SageMaker
  16. 16. © 2020, Amazon Web Services, Inc. or its Affiliates. Algorithms require many hyperparameters Neural Networks Number of layers Hidden layer width Learning rate Embedding dimensions Dropout … XGBoost Tree depth Max leaf nodes Gamma Eta Lambda Alpha … Which values should I pick? Which ones are the most influential? How many combinations should I try?
  17. 17. © 2020, Amazon Web Services, Inc. or its Affiliates. Setting hyperparameters in Amazon SageMaker • Built-in algorithms • Python parameters set on the relevant estimator (KMeans, LinearLearner, etc.) xgb.set_hyperparameters(max_depth=5, eta=0.2, gamma=4) • Built-in frameworks • hyperparameters parameter passed to the relevant estimator (TensorFlow, MXNet, etc.) • This must be a Python dictionary tf_estimator = TensorFlow(…, hyperparameters={'epochs’: 1, ‘lr’: ‘0.01’}) • Your code must be able to accept them as command-line arguments (script mode) • Bring your own container • hyperparameters parameter passed to the Estimator • This must be Python dictionary • It’s automatically copied inside the container: /opt/ml/input/config/hyperparameters.json
  18. 18. © 2020, Amazon Web Services, Inc. or its Affiliates. Finding the optimal set of hyperparameters • Manual Search: ”I know what I’m doing” • Grid Search: “X marks the spot” • Typically training hundreds of models • Slow and expensive • Random Search: “Spray and pray” « Random Search for Hyper-Parameter Optimization », Bergstra & Bengio, 2012 • Works better and faster than Grid Search • But… but… but… it’s random! • Hyperparameter Optimization: use ML to predict hyperparameters • Training fewer models • Gaussian Process Regression and Bayesian Optimization • https://docs.aws.amazon.com/en_pv/sagemaker/latest/dg/automatic-model-tuning-how-it-works.html
  19. 19. © 2020, Amazon Web Services, Inc. or its Affiliates. Automatic Model Tuning in Amazon SageMaker 1. Define an Estimator the normal way 2. Define the metric to tune on • Pre-defined metrics for built-in algorithms and frameworks • Or anything present in the training log, provided that you pass a regular expression for it 3. Define parameter ranges to explore • Type: categorical (avoid if possible), integer, continuous (aka floating point) • Range of values • Scaling: linear, logarithmic, reverse logarithmic 4. Create an HyperparameterTuner • Estimator, metric, parameters, total number of jobs, number of jobs in parallel • Strategy: bayesian (default), or random search 5. Launch the tuning job with fit()
  20. 20. © 2020, Amazon Web Services, Inc. or its Affiliates. Workflow Training JobHyperparameter Tuning Job Tuning strategy Objective metrics Training Job Training Job Training Job Clients (console, notebook, IDEs, CLI) model name model1 model2 … objective metric 0.8 0.75 … eta 0.07 0.09 … max_depth 6 5 … …
  21. 21. © 2020, Amazon Web Services, Inc. or its Affiliates. Tips • Use the bayesian strategy for better, faster, cheaper results • Most customers use random search as a baseline, to check that bayesian performs better • Don’t run too many jobs in parallel • This gives the bayesian strategy fewer opportunities to predict • Instance limits! • Don’t run too many jobs • Bayesian typically requires 10x fewer jobs than random • Cost vs business benefits (beware of diminishing returns)
  22. 22. © 2020, Amazon Web Services, Inc. or its Affiliates. Resources on Automatic Model Tuning Documentation https://docs.aws.amazon.com/sagemaker/latest/dg/automatic-model-tuning.html https://sagemaker.readthedocs.io/en/stable/tuner.html Notebooks https://github.com/awslabs/amazon-sagemaker-examples/tree/master/hyperparameter_tuning Blog posts https://aws.amazon.com/blogs/aws/sagemaker-automatic-model-tuning/ https://aws.amazon.com/blogs/machine-learning/amazon-sagemaker-automatic-model-tuning-produces-better-models-faster/ https://aws.amazon.com/blogs/machine-learning/amazon-sagemaker-automatic-model-tuning-now-supports-early-stopping-of- training-jobs/ https://aws.amazon.com/blogs/machine-learning/amazon-sagemaker-automatic-model-tuning-becomes-more-efficient-with- warm-start-of-hyperparameter-tuning-jobs/ https://aws.amazon.com/blogs/machine-learning/amazon-sagemaker-automatic-model-tuning-now-supports-random-search-and- hyperparameter-scaling/
  23. 23. © 2020, Amazon Web Services, Inc. or its Affiliates. AWS Training & Certification https://www.aws.training Free on-demand courses to help you build new cloud skills Curriculum: Exploring the Machine Learning Toolset https://www.aws.training/Details/Curriculum?id=27155 Curriculum: Developing Machine Learning Applications https://www.aws.training/Details/Curriculum?id=27243 Curriculum: Machine Learning Security https://www.aws.training/Details/Curriculum?id=27273 Curriculum: Demystifying AI/ML/DL https://www.aws.training/Details/Curriculum?id=27241 Video: AWS Foundations: Machine Learning Basics https://www.aws.training/Details/Video?id=49644 Curriculum: Conversation Primer: Machine Learning Terminology https://www.aws.training/Details/Curriculum?id=27270 For more info on AWS T&C visit: https://aws.amazon.com/it/training/
  24. 24. Thanks!

×