Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Machine learning using Kubernetes

93 views

Published on

Machine learning using Kubernetes

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Machine learning using Kubernetes

  1. 1. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Arun Gupta, @arungupta Machine Learning using Kubernetes
  2. 2. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Centerpiece for digital transformation
  3. 3. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  4. 4. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Machine Learning 101
  5. 5. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  6. 6. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. EC2 P3 & P3dn EC2 C5 FPGAs Greengrass Elastic inference FRAMEWORKS INTERFACES INFRASTRUCTURE Inferentia EC2 G4 The Amazon ML stack: Broadest & deepest set of capabilities
  7. 7. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Ground Truth Notebooks Algorithms + Marketplace Reinforcement Learning Training Optimization Deployment Hosting ML Frameworks + Infrastructure EC2 P3 & P3dn EC2 C5 FPGAs Greengrass Elastic inference FRAMEWORKS INTERFACES INFRASTRUCTURE Inferentia EC2 G4 The Amazon ML stack: Broadest & deepest set of capabilities
  8. 8. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. R E K O G N I T I O N I M A G E P O L L Y T R A N S C R I B E T R A N S L A T E C O M P R E H E N D L E XR E K O G N I T I O N V I D E O F O R E C A S TT E X T R A C T P E R S O N A L I Z E VISION SPEECH LANGUAGE CHATBOTS FORECASTING RECOMMENDATIONS ML Services Amazon SageMaker Ground Truth Notebooks Algorithms + Marketplace Reinforcement Learning Training Optimization Deployment Hosting ML Frameworks + Infrastructure EC2 P3 & P3dn EC2 C5 FPGAs Greengrass Elastic inference FRAMEWORKS INTERFACES INFRASTRUCTURE Inferentia EC2 G4 The Amazon ML stack: Broadest & deepest set of capabilities
  9. 9. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. The Amazon ML stack: Broadest & deepest set of capabilities R E K O G N I T I O N I M A G E P O L L Y T R A N S C R I B E T R A N S L A T E C O M P R E H E N D L E XR E K O G N I T I O N V I D E O F O R E C A S TT E X T R A C T P E R S O N A L I Z E VISION SPEECH LANGUAGE CHATBOTS FORECASTING RECOMMENDATIONS Amazon SageMaker Ground Truth Notebooks Algorithms + Marketplace Reinforcement Learning Training Optimization Deployment Hosting EC2 P3 & P3dn EC2 C5 FPGAs Greengrass Elastic inference FRAMEWORKS INTERFACES Inferentia EC2 G4 INFRASTRUCTURE
  10. 10. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. “A little less conversation, a little more action, please” M A C H I N E L E A R N I N G S T O R A G E Amazon Redshift + Redshift Spectrum Amazon QuickSight Amazon EMR Hadoop, Spark, Presto, Pig, Hive…19 total Amazon Athena Amazon Kinesis Amazon Elasticsearch Service AWS Glue A N A L Y T I C S Amazon S3 Standard-IA Amazon S3 Standard Amazon S3 One Zone-IA Amazon Glacier Amazon S3 Intelligent- Tiering N E W Amazon EBS Amazon S3 Glacier Deep Archive N E W R E K O G N I T I O N I M A G E P O L L Y T R A N S C R I B E T R A N S L A T E C O M P R E H E N D L E XR E K O G N I T I O N V I D E O F O R E C A S TT E X T R A C T P E R S O N A L I Z E VI SI ON SP E E C H LANGUAGE C HATBOTS FORE C ASTI NG RE C OMME NDATI ONS Amazon SageMaker Ground Truth Notebooks Algorithms + Marketplace Reinforcement Learning Training Optimization Deployment Hosting EC2 P3 & P3dn EC2 C5 FPGAs Greengrass Elastic inference F R A M E W O R K S I N T E R F A C E S Inferentia EC2 G4 I N F R A S T R U C T U R E
  11. 11. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. R E K O G N I T I O N I M A G E P O L L Y T R A N S C R I B E T R A N S L A T E C O M P R E H E N D L E XR E K O G N I T I O N V I D E O F O R E C A S TT E X T R A C T P E R S O N A L I Z E VISION SPEECH LANGUAGE CHATBOTS FORECASTING RECOMMENDATIONS Amazon SageMaker Ground Truth Notebooks Algorithms + Marketplace Reinforcement Learning Training Optimization Deployment Hosting EC2 P3 & P3dn EC2 C5 FPGAs Greengrass Elastic inference FRAMEWORKS INTERFACES INFRASTRUCTURE Inferentia EC2 G4 Machine Learning using Kubernetes
  12. 12. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. ML Frameworks + Infrastructure EC2 P3 & P3dn EC2 C5 FPGAs Greengrass Elastic inference FRAMEWORKS INTERFACES INFRASTRUCTURE Inferentia EC2 G4 Machine Learning using Kubernetes
  13. 13. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Why Machine Learning on Kubernetes? Composability Portability Scalability O N - P R E M I S E S C L O U D http://www.shutterstock.com/gallery-635827p1.html
  14. 14. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon EKS—run Kubernetes in cloud Managed Kubernetes control plane, attach data plane Native upstream Kubernetes experience Platform for enterprises to run production-grade workloads Integrates with additional AWS services
  15. 15. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon EKS deployment mycluster.eks.amazonaws.com Availability Zone 1 Availability Zone 2 Availability Zone 3 kubectl
  16. 16. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Getting started with Amazon EKS eksctl CLI—create Amazon EKS clusters (eksctl.io) Creates all resources needed for the cluster
  17. 17. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Creating an EKS cluster using eksctl Auto generated cluster name 2x m5.large nodes Uses AWS EKS AMI us-west-2 region Dedicated VPCs Static AMI resolver GPU-powered cluster Install
  18. 18. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. GPUs for Machine Learning training Matrix A b11 b12 b13 b21 b22 b23 b31 b32 b33 Matrix B a11.b11 + a12.b21 + a13.b31 a11.b12 + a12.b22 + a13.b32 a11.b13 + a12.b23 + a13.b33 a21.b11 + a22.b21 + a23.b31 a21.b12 + a22.b22 + a23.b32 a21.b13 + a22.b23 + a23.b33 a31.b11 + a32.b21 + a33.b31 a31.b12 + a32.b22 + a33.b32 a31.b13 + a32.b23 + a33.b33 Matrix C Operations can be parallelized across 1,000s of cores a11 a12 a13 a21 a22 a23 a31 a32 a33 • Training maps to matrix multiplications • Coupled with extremely high memory bandwidth
  19. 19. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Train Inference Set up K8s for ML—option 1 Trained model 2 3 4 Data 1
  20. 20. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Create K8s cluster for ML—option 1 Create training cluster Create inference cluster
  21. 21. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Scaling the cluster
  22. 22. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Set up K8s for ML—option 2a Train & inference Trained model 2 3 4 role: train role: train role: train role: inference role: inference Data 1
  23. 23. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Create K8s cluster for ML—option 2 Eksctl cluster configuration with two node groups Create cluster
  24. 24. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Set up K8s for ML—option 2b Train, inference, & applications role: train role: train role: train role: inference role: inference role: apps role: apps
  25. 25. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Challenges in setting up containers for ML
  26. 26. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS deep learning containers KEY FEATURES Customizable container images Support for TensorFlow, Apache MXNet Single and multi-node training and inference Pre-packaged Docker container images fully configured and validated Best performance and scalability without tuning Works with Amazon EKS, Amazon ECS, and Amazon EC2
  27. 27. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. 16 container images Training Inference GPU CPU Python 2.7 Python 3.6
  28. 28. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. ML on K8s—without KubeFlow Credits: @aronchik
  29. 29. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. ML on K8s—with KubeFlow Credits: @aronchik
  30. 30. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. What’s in KubeFlow?
  31. 31. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. MNIST database Database of gray-scaled handwritten digits Training set of 60k Test set of 10k Size-normalized (28x28 pixels) Centered in a fixed-size image
  32. 32. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Fashion MNIST Database of Zalando’s article images Labels assigned to 10 items Drop-in replacement for MNIST
  33. 33. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. TensorFlow Open source library to develop and train ML models Created by Google Brain team Can run on desktop, servers, mobiles, edge devices
  34. 34. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. AWS is the platform of choice to run TensorFlow of all TensorFlow workloads in the cloud runs on AWS
  35. 35. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Train twice as fast with TensorFlow 65% Scaling efficiency with 256 GPUs STOCK TENSORFLOW 90% Scaling efficiency with 256 GPUS AWS-OPTIMZED TENSORFLOW
  36. 36. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Machine Learning using TensorFlow on K8s Read training data Build training model Feed test data and match the expected output Report accuracy, improve with each run Download Keras-consumable Fashion-MNIST training and test data Run 40 epochs on the model Export the model to S3 bucket
  37. 37. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Apache MXNet
  38. 38. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Advantages of KubeFlow on AWS EKS cluster provision with External traffic with to manage Lustre file system Centralized and unified K8s logs in TLS and Auth with and for your K8s API server endpoint Detect GPU instance and install kubeflow.org/docs/aws
  39. 39. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Distributed training using Horovod Distributed Training framework for TensorFlow, Keras, PyTorch, and MXNet Traditional Russian dance where participants dance in a circle with linked hands
  40. 40. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Machine Learning pipeline Choose and Optimize your ML algorithm Setup and manage environments for training Deploy model in production Collect & prepare training data Train and tune model (trial and error) Scale & manage environment in production
  41. 41. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Machine Learning pipeline for K8s Linear regression, decision tree, BYOA GPU- and CPU- based clusters, *operators (TensorFlow, MXNet, …) TensorFlow Serving, MXNet Model Server, Seldon, … EMR, Redshift, S3 TensorFlow, MXNet, PyTorch, Keras, … EKS Choose and Optimize your ML algorithm Setup and manage environments for training Deploy model in production Collect & prepare training data Train and tune model (trial and error) Scale & manage environment in production
  42. 42. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Machine Learning pipeline using SageMaker Choose and Optimize your ML algorithm Setup and manage environments for training Deploy model in production Collect & prepare training data Train and tune model (trial and error) Scale & manage environment in production Built-in high performance algorithms One-click training One-click deployment Prebuilt notebooks for common problems Optimization Fully managed, auto-scaling, health and security checks
  43. 43. © 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved. References

×