Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Build, train, and deploy ML models with Amazon SageMaker - AIM302 - New York AWS Summit

622 views

Published on

Amazon SageMaker is a fully managed service that enables developers and data scientists to quickly and easily build, train, and deploy machine learning models at any scale. In this workshop, you learn how to build an ML model using Amazon SageMaker's built-in algorithms and frameworks. We then train the model using automatic model tuning to quickly achieve a high level of accuracy. Finally, we deploy the model in production, where you can start generating predictions to achieve the best results. When finished, you will understand how Amazon SageMaker removes the complexity and barriers that typically slow down developers using ML.

  • Be the first to comment

Build, train, and deploy ML models with Amazon SageMaker - AIM302 - New York AWS Summit

  1. 1. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Building, training, and deploying ML models with Amazon SageMaker Emily Webber Machine Learning Specialist Amazon Web Services A I M 3 0 2
  2. 2. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T For this workshop, you need • A personal laptop • An AWS account • Access to Amazon SageMaker, Amazon S3, and Amazon ECS • Credits will be provided to offset cost of AWS services charged to your account for this workshop
  3. 3. 3 1837 1969
  4. 4. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Put machine learning in the hands of every developer Our mission at AWS Our mission at AWS Put machine learning in the hands of every developer
  5. 5. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T The Amazon ML stack: Broadest & deepest set of capabilities AI SERVICES Easily add intelligence to applications without machine learning skills Vision | Documents | Speech | Language | Chatbots | Forecasting | Recommendations ML SERVICES Build, train, and deploy machine learning models quickly and easily Data labeling | Pre-built algorithms & notebooks | One-click training and deployment ML FRAMEWORKS & INFRASTRUCTURE Flexibility & choice, highest-performing infrastructure Support for ML frameworks | Compute options purpose built for ML
  6. 6. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon SageMaker 1 2 3 5 I I I I Notebook Instances with 200+ Examples 17 Built-In Algorithms ML Training Service x 5 ML Hosting Service 4 I Hyper-Parameter Tuning Service x 2 1 1 + 𝑒−𝑥 6 Batch Transform • Inferencing • Data Transformation 7 SDK’s: • Python • Spark 8 Documentation & whitepapers & blog posts
  7. 7. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Problem Statement Healthcare insurance fraud is a pressing problem, causing substantial and increasing costs in medical insurance programs. Due to large amounts of claims submitted, review of individual claims becomes a difficult task and encourages the employment of automated pre-payment controls and better post-payment decision support tools to enable subject matter expert analysis. We will demonstrate the application of unsupervised anomalous outlier techniques on a minimal set of metrics made available in the CMS Medicare inpatient claims from 2008.
  8. 8. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Dataset CMS Medicare inpatient claims from 2008 • Medicare inpatient claims from 2008 • Each record is an inpatient claim incurred by a 5% sample of Medicare beneficiaries. • Beneficiary identities are not provided • Zip-code of facilities where patient was treated are not provided • The file contains seven (8) variables. One primary key and seven analytic variables • Data dictionary required to interpret codes in dataset are provided
  9. 9. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Data Variables
  10. 10. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Techniques and Algorithms used in the workshop • Outlier Detection • Word Embedding and Word2Vec • Principal Component Analysis • Calculate the Mahalanobis distance
  11. 11. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Outlier Detection “Observation which deviates so much from other observations as to arouse suspicion it was generated by a different mechanism” —Hawkins(1980) • Modeling normal objects and outlier effectively. The border between data normality and abnormality (outliers) is often not clear cut. • Outlier detection methods could be application specific. Example, in clinical data small deviation could be an outlier, but, in a marketing application large deviation is required to justify an outlier. • Noise in data may be present as deviations in attribute values or even as missing values. Noise may hide an outlier or may flag deviation as an outlier. • Providing justification for an outlier from understandability point of view may be difficult. Challenges
  12. 12. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Word Embedding and Word2Vec Word Embedding Word2Vec • Texts converted into numbers and there may be different numerical representations of the same text • Many techniques exist. CBOW (Continuous Bag of words) and skip-gram are popular and effective for large corpus of documents • Example, CBOW predict the probability of a word given a context • Shallow neural networks which map word(s) to the target variable which is also a word(s). • Learn weights which act as word vector representations
  13. 13. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Principal Component Analysis When • Do you want to reduce the number of variables, but aren’t able to identify variables to completely remove from consideration? • Do you want to ensure your variables are independent of one another? • Are you comfortable making your independent variables less interpretable? How • A measure of how each variable is associated with one another. (Covariance matrix) • The directions in which our data are dispersed. (Eigenvectors) • The relative importance of these different directions. (Eigenvalues)
  14. 14. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Isolation of different environment Notebook Instance Training Inference Amazon SageMaker provides you a clean separation between your Jupyter Notebook Instance, Training environment instances and Inference environment instances by launching a new stack on new machine to support below functions. This is your IDE environment to write scripts and test out your algorithm on a small subset of data. Choose lower size instances for this work. This is your training environment that is created when you make a fit function call using SageMaker SDK from within a Jupyter Notebook. Choose one or multiple lager instances to train your learning algorithm on a large dataset. The environment is terminated automatically when training completes. This is your hosting environment for your trained models. You can scale it out automatically by using automatic scaling feature. You can choose to run smaller or larger size instances to meet your demand for inference.
  15. 15. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.S U M M I T Complete the workshop https://tinyurl.com/y3ch2kjg
  16. 16. Thank you! S U M M I T © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved.

×