Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Artificial Intelligence and Machine Learning 101 - AWS Federal Pop-Up Loft

655 views

Published on

In this course, learn how to solve a real-world use case with machine learning and produce actionable results using Amazon SageMaker. This course teaches you how to use Amazon SageMaker to cover the different stages of the typical data science process, from analyzing and visualizing a data set, to preparing the data and feature engineering, down to the practical aspects of model building, training, tuning, and deployment.

  • Be the first to comment

Artificial Intelligence and Machine Learning 101 - AWS Federal Pop-Up Loft

  1. 1. Artificial Intelligence & Machine Learning 101 Dr. Thomas Ferleman ferleman@amazon.com 1
  2. 2. Machine Learning Primer
  3. 3. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE Module objectives • Understand what machine learning is? • Identify approaches to solving a machine learning problem • Identifying and preparing datasets • Model selection • Supervised and unsupervised • Evaluating model performance © 2019 Amazon Web Services, Inc. or its Affiliates. All rights reserved. 3
  4. 4. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE© 2019 Amazon Web Services, Inc. or its Affiliates. All rights reserved. What is machine learning? 4
  5. 5. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE© 2019 Amazon Web Services, Inc. or its Affiliates. All rights reserved. Jeff Bezos on Machine Learning 5
  6. 6. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE The machine learning journey © 2019 Amazon Web Services, Inc. or its Affiliates. All rights reserved. What is the difference between AI, machine learning and deep learning?, Meenal Dhande 6 Artificial Intelligence Machine Learning Deep Learning • Any technique that enables computers to mimic human intelligence, using logic, if-then rules, decision trees, and machine learning (including deep learning.) • A subset of AI that includes abstruse statistical techniques that enable machine to improve at tasks with experience. The category includes deep learning. • The subset of machine learning composed of algorithms that permit software to train itself to perform tasks, like speech and image recognition, by exposing multilayered neural networks to vast amounts of data.
  7. 7. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE What is machine learning? • Using technology to discover trends and patterns and compute mathematical predictive models based on factual past data • Past data, statistics and probability theory are key tools used to build machine learning models and make predictions. © 2019 Amazon Web Services, Inc. or its Affiliates. All rights reserved. Where traditional business analytics aims at answering questions about past events, machine learning aims at answering questions about the possibilities of future events 7
  8. 8. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE Timeline of machine learning © 2019 Amazon Web Services, Inc. or its Affiliates. All rights reserved. 1950 1952 1957 1979 1986 1997 2011 2012 2014 2016 The Learning Machine (Alan Turing) Perceptron (Frank Rosenblatt) Machine Playing Checker (Author Samuel) Stanford Cart Backpropagation (D. Rumelhart, G. Hinton, R. Williams) Deep Blue Beats Kasparov Watson Wins Jeopardy DeepMind Wins Go Google NN recognizing cat in YouTube Facebook DeepFace, Amazon Echo 8
  9. 9. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE The Perfect Storm 9© 2019 Amazon Web Services, Inc. or its Affiliates. All rights reserved. 2020s VolumesinExabytes Sensors & Devices Social Media VoIP Enterprise Data DATA PER DAY PROCESSED BY GOOGLE TWEETS PER DAY MINUTES SPENT ON FACEBOOK EACH MONTH DATA SENT AND RECEIVED BY MOBILE USERS PRODUCTS ORDERED ON AMAZON PER SECOND 24 PETABYTES DATA CONSUMED BY HOUSEHOLDS EVERY DAY 375 MEGABYTES VIDEO UPLOADED TO YOUTUBE EVERY MINUTE 20 HOURS NUMBER OF EMAILS SENT EVERY SECOND 2.9 MILLION 50 MILLION 700 BILLION 1.3 EXABYTES 72.9 ITEMS Pace of Innovation 24 48 61 82 159 280 516 722 1017 1430 MATH + DATA + COMPUTE
  10. 10. Traditional ML vs. Deep Learning Feature Extraction Model Output: Knight Traditional ML: Deep learning: Output: Knight Feature Extraction
  11. 11. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE The Building Blocks of Machine Learning
  12. 12. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE A Flywheel for Data Click stream User activity Generated content Purchases Clicks Likes Sensor data Object Storage Databases Data warehouse Streaming analytics BI Hadoop Spark/Presto Elasticsearch More Users More Data Better Analytics Better Products Machine Learning Artificial Intelligence Deep Learning
  13. 13. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE Data Driven Development Here and now Real-time processing and dashboards Predictions Enable smart applications Retrospective Analysis and reporting
  14. 14. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE The Business Importance of Machine Learning
  15. 15. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE Starting with the Data Clean Train and Test Select and Engineer Visualize ModelData
  16. 16. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE Components of Machine Learning Machine learning encompasses methods & systems that: AdaptOptimizeSummarizeExtractPredict Your data + machine learning = smart applications
  17. 17. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE Application Areas Application Domain Use Cases Personalization • Recommending content • Loading Predictive content • Improving user experience Fraud detection • Detecting fraudulent transactions • Filtering spam emails • Flagging suspicious reviews Targeted marketing • Matching customers and offers • Choosing marketing campaigns • Cross-selling and up-selling
  18. 18. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE Application Areas (Contd.) Application Domain Use Cases Customer support • Predictive routing of customer emails • Social media listening Content classification • Categorizing documents • Matching hiring managers and resumes Churn prediction • Finding customers who are likely to stop using the service • Upgrading targeting
  19. 19. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE Types of Machine Learning
  20. 20. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE Types of Machine Learning Problems Clustering Supervised Learning Unsupervised Learning Reinforcement Learning Classification Dimensionality Reduction Regression Machine Learning
  21. 21. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE Types of Machine Learning Problems (Contd.) Knight Bishop King Knight Known Historical Data INPUT LABELS Training Dataset Supervised Learning
  22. 22. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE Types of Machine Learning Problems (Contd.) Supervised Learning Unseen/New Input Prediction Model Training Dataset Supervised Learning
  23. 23. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE Types of Machine Learning Problems (Contd.) Unsupervised Learning
  24. 24. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE Types of Machine Learning Problems (Contd.) • Data from freeway traffic sensors • Sensor data includes ‐ Sensor ID ‐ Year and day of the year ‐ Day of the week ‐ Time ‐ Occupancy (count) ‐ Average speed (mph) • Using clustering to predict which sensor might go bad Unsupervised Learning – Anomaly Detection AverageSpeed Hours
  25. 25. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE Case Study Challenges • Provide timely home valuations for all new homes. • Performs machine learning jobs in hours instead of a day. • Scales storage and compute capacity on demand. Solutions • Runs Zestimate – Zillow’s machine learning based home-valuation tool on AWS. • Gives customers more accurate data on more than 100 million homes. Zillow We can compute Zestimates in seconds, as opposed to hours, by using Amazon Kinesis Streams and Spark on Amazon EMR. Jasjeet Thind Vice President of Data Science and Engineering “ ”Zillow provides online home information to tens of millions of buyers and sellers every day.
  26. 26. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE CRISP-DM Methodology
  27. 27. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE Defining CRISP-DM • Is the current de-facto process for doing data science. • Highlights the cyclical and iterative nature of data science applications. Cross Industry Standard Process for Data Mining: Business Understanding Data Understanding Data Preparation Modeling Evaluation Deployment CRISP-DM Process
  28. 28. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE CRISP-DM Process • Determining business objectives. • Determining the ML problem category. • Determining the factors for measuring success. • Producing a preliminary project plan. Teams: Business, front-end Don’t dive into the data immediately! First, understand the goals by: Phase 1: Business Understanding Business Understanding Data Understanding Data Preparation Modeling Evaluation Deployment
  29. 29. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE CRISP-DM Process Exploring the data provides us with necessary information, like: • Data quality and processing. • Interesting patterns in the data. • Likely paths forward after the modeling has started. Teams: Data engineering Phase 2: Understanding the Data Business Understanding Data Understanding Data Preparation Modeling Evaluation Deployment
  30. 30. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE CRISP-DM Process Old Tools: New tools: • Warehouse + SQL • Tableau • Python Phase 2: Understanding the Data (Contd.) Business Understanding Data Preparation Modeling Evaluation Deployment Amazon Athena Amazon QuickSight Data Understanding
  31. 31. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE CRISP-DM Process Data Preparation: • Normalization • Feature selection • Feature extraction Modeling • Training a model and receiving an output Teams: Data analysts/data science Phase 3-4: Data Preparation and Modeling Business Understanding Data Understanding Data Preparation Modeling Evaluation Deployment
  32. 32. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE CRISP-DM Process Services and Tools:Phase 3-4: Data Preparation and Modeling (Contd.) Business Understanding Data Understanding Data Preparation Modeling Evaluation Deployment Amazon EMR Amazon EC2
  33. 33. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE CRISP-DM Process Tying back to your original business objectives: • What are the false positive and false negative rates? • Is the performance acceptable? • Is it actually deployable? Teams: Data analysts/data science Phase 5: Evaluation Business Understanding Data Understanding Data Preparation Modeling Evaluation Deployment
  34. 34. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE CRISP-DM Process Services and Tools:Phase 5: Evaluation (Contd.) Business Understanding Data Understanding Data Preparation Modeling Deployment Amazon EMR Amazon EC2 Evaluation
  35. 35. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE CRISP-DM Process Services Used:Phase 6: Business Understanding Teams: DevOps Business Understanding Data Understanding Data Preparation Modeling Evaluation AWS Lambda AWS Batch AWS CodeDeploy AWS CodePipeline Amazon ECS AWS IoT Deployment
  36. 36. Why is Machine Learning so hard?
  37. 37. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE A Variety of Skills are Necessary ML Scientist Applied Scientist Research Scientist Data Scientist Data Engineer Software Engineer Science Math; statistics; ML algorithms Engineering ML libraries; data collection tools; programming languages ML Scientist Applied Scientist Research Scientist Data Scientist Business Intelligence Engineer Data Engineer Software Engineer Dev Manager Technical Program Manager
  38. 38. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE Challenge 1: Expertise • Limited supply of domain experts (Data scientists and Machine Learning engineers) who can extract meaningful information from the data. • Expensive to hire or outsource.
  39. 39. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE Challenge 2: Scale Building and scaling machine learning technology • Has many choices for tools, but few are good. • Is difficult to use and scale. • Has many moving pieces, which leads to custom solutions every time.
  40. 40. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE Challenge 3: Operationalization • Working with complex and error-prone data workflows and custom platforms and API’s. • Spending significant time on managing the model lifecycle.
  41. 41. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE Challenge 4: Complexity of Data • Large volumes of high quality data • Manual process of cleaning, labeling, and engineering
  42. 42. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE Challenge 5: Cost • Purchasing and maintaining hardware is complex and typically comes at a fixed cost • Varying CPU, GPU, memory, and networking capacity
  43. 43. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE The Result Building a machine learning application can be… RiskyExpensiveSlow …but what if there were a better way?
  44. 44. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE Marketing offer on a new product 44© 2019 Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  45. 45. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE Option 1: build a rules engine © 2019 Amazon Web Services, Inc. or its Affiliates. All rights reserved. Age Gender Purchase Date Items 30 M 3/1/17 Toy 40 M 1/3/17 Book …. …… ….. ….. Input Output Age Gender Purchase Date Items 30 M 3/1/17 Toy …. …… ….. ….. Rule 1: 15 <age< 30 Rule 2: Bought Toy=Y, Last Purchase<30 days Rule 3: Gender = ‘M’, Bought Toy =‘Y’ Rule 4: …….. Rule 5: …….. Human Programmer 45
  46. 46. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE Problems with hand designed rules © 2019 Amazon Web Services, Inc. or its Affiliates. All rights reserved. Adaptability Scalability Closed Loop 46
  47. 47. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE Option 2: learn the business rules from data © 2019 Amazon Web Services, Inc. or its Affiliates. All rights reserved. Learning Algorithm Model Output Historical Purchase Data (Training Data) Prediction Age Gender Items 35 F 39 M Toy Input - New Unseen Data 47 Age Gender Purchase Date Items 30 M 3/1/17 Toy 40 M 1/3/17 Book …. …… ….. …..
  48. 48. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE© 2019 Amazon Web Services, Inc. or its Affiliates. All rights reserved. We call this approach machine learning 48
  49. 49. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE When to use machine learning? • Use machine learning when you can’t code it • Complex tasks where deterministic solution don’t suffice • E.g. Recognizing speech/images • Use machine learning when you can’t scale it • Replace repetitive tasks needing human like expertise • E.g Recommendations, spam, fraud detection, machine translation • Use machine learning when you have to adapt/personalize • E.g. Recommendation and personalization • Use machine learning when you can’t track it • E.g. Automated driving © 2019 Amazon Web Services, Inc. or its Affiliates. All rights reserved. 49
  50. 50. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE© 2019 Amazon Web Services, Inc. or its Affiliates. All rights reserved. Identifying and preparing datasets
  51. 51. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE Importance of good training data 51 Healthcare Treatment DecisionAsthmaPneumonia © 2019 Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  52. 52. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE What are you solving? 52 Classification Problem Clustering Problem Ranking Problem Regression Problem © 2019 Amazon Web Services, Inc. or its Affiliates. All rights reserved. Or
  53. 53. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE What are your use cases? • Do you have large ERP systems with volumes of data? • Is your data locked away in jail? • Is your data spread across many siloes? • Do you even have data that can be accessed? • Do you have warehouses full of paper documents? 53© 2019 Amazon Web Services, Inc. or its Affiliates. All rights reserved.
  54. 54. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE Collect data and label • Break down data silos! • Reduce fragmented data • Collect as many features as possible • This will be iterative • As Data Scientists develop the model, they will decrease the feature space or discover new features that need to be added • Will this model require labeled data? • Human factors considerations • Recommendations may also require labeled data © 2019 Amazon Web Services, Inc. or its Affiliates. All rights reserved. 54
  55. 55. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE Format the data • Are the dates the same? • 9 October 2018, 10/9/2018, Oct 9 18 • Are dollar values formatted the same? • $1.00, 1.00, one dollar, 1 • Acronyms • AWS, Amazon Web Services • Integers vs. decimals (doubles) • 1, 1.0, 1.00 © 2019 Amazon Web Services, Inc. or its Affiliates. All rights reserved. 55
  56. 56. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE© 2019 Amazon Web Services, Inc. or its Affiliates. All rights reserved. What features result in the best model performance?
  57. 57. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE Dropping attributes Too much variance NoiseToo many missing values © 2019 Amazon Web Services, Inc. or its Affiliates. All rights reserved. data.isnull().sum() PassengerId 0 Survived 0 Pclass 0 Name 0 Sex 0 Age 177 SibSp 0 Parch 0 Ticket 0 Fare 0 Cabin 687 Embarked 2 dtype: int64 • Unique to each item: • Name • Passenger ID 57
  58. 58. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE Replacing missing values • Constant Value, distinct from all other values • -1 • 0 • Remove missing values • Drop Columns or Rows • A value randomly selected from another record • A mean, median or mode value for the column • A value estimated by another predictive model Survived Name Sex Age Ticket Fare Cabin 0 Montvila, Rev. Juozas male 27 211536 13 NaN 1 Graham, Miss. Margaret Edith female 19 112053 30 B42 0 Johnston, Miss. Catherine Helen "Carrie" female NaN W./C. 6607 23.45 NaN 1 Behr, Mr. Karl Howell male 26 111369 30 C148 0 Dooley, Mr. Patrick male 32 370376 7.75 NaN © 2019 Amazon Web Services, Inc. or its Affiliates. All rights reserved. 58
  59. 59. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE Feature engineering Minimize Redundant features © 2019 Amazon Web Services, Inc. or its Affiliates. All rights reserved. Continuous to Categorical Combining Features • Age • Child 0-12yrs • Teenager 13-19yrs • Adult 20-59yrs • Senior 60yrs+ • Fare Ranges • Siblings/Parents • Combine into Family • Names • Combine into Surnames 59
  60. 60. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE© 2019 Amazon Web Services, Inc. or its Affiliates. All rights reserved. Model selection
  61. 61. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE What are you solving? 61 Classification Problem Clustering Problem Ranking Problem Regression Problem © 2019 Amazon Web Services, Inc. or its Affiliates. All rights reserved. Or
  62. 62. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE Supervised learning 62 Building PersonCar • The algorithm is given a set of training examples where the data and target are known. • The algorithm predicts the target value for new datasets, containing the same attributes. • Human intervention and validation required Example: Photo classification and tagging © 2019 Amazon Web Services, Inc. or its Affiliates. All rights reserved. Or Or
  63. 63. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE Label Supervised learning : how machines learn © 2019 Amazon Web Services, Inc. or its Affiliates. All rights reserved. ? Adjust Model 63 Machine Learning Algorithm Prediction Building Label Input Car Building Training Data
  64. 64. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE Unsupervised learning 64 Input Prediction • Algorithm is given mass amounts of data, and it must find patterns and relationships between the data. • Algorithm draws inferences from datasets. • Human intervention is not required. Example: Auto-classification of documents based on context © 2019 Amazon Web Services, Inc. or its Affiliates. All rights reserved. Machine Learning Algorithm
  65. 65. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE Machine learning use cases • Supervised Learning • Classification • Spam detection • Customer churn prediction • Regression • House price prediction • Demand forecasting • Unsupervised Learning • Clustering • Customer segmentation © 2019 Amazon Web Services, Inc. or its Affiliates. All rights reserved. 65
  66. 66. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE© 2019 Amazon Web Services, Inc. or its Affiliates. All rights reserved. Model training
  67. 67. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE Split training data © 2019 Amazon Web Services, Inc. or its Affiliates. All rights reserved. All Labeled Dataset Training Data 67 70% 30%
  68. 68. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE Train with training data © 2019 Amazon Web Services, Inc. or its Affiliates. All rights reserved. All Labeled Dataset Training Data Training 68 70% 30% Trial Model
  69. 69. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE Split the test data © 2019 Amazon Web Services, Inc. or its Affiliates. All rights reserved. All Labeled Dataset Training Data Training Test Data 69 70% 30% Trial Model
  70. 70. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE Model evaluation © 2019 Amazon Web Services, Inc. or its Affiliates. All rights reserved. All Labeled Dataset Training Data Training Test Data Evaluation Result 70 70% 30% Trial Model
  71. 71. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE Performance measurement © 2019 Amazon Web Services, Inc. or its Affiliates. All rights reserved. All Labeled Dataset Training Data Training Test Data Evaluation Result Measure Performance 71 70% 30% Trial Model
  72. 72. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE© 2019 Amazon Web Services, Inc. or its Affiliates. All rights reserved. Evaluating model performance
  73. 73. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE Cross validation © 2019 Amazon Web Services, Inc. or its Affiliates. All rights reserved. Labeled Dataset k1 k2 k3 k4 k5 Iteration Training Data Test Data 1 k1, k2, k3, k4 k5 2 k2, k3, k4, k5 k1 3 k3, k4, k5, k1 k2 4 k4, k5, k1, k2 k3 5 k5, k1, k2, k3 k4 Partition data into n subsamples Each iteration, leave one subsample out 73
  74. 74. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE Precision vs. accuracy 74© 2019 Amazon Web Services, Inc. or its Affiliates. All rights reserved. High accuracy High precision Low accuracy High precision High accuracy Low precision Low accuracy Low precision
  75. 75. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE Confusion matrix © 2019 Amazon Web Services, Inc. or its Affiliates. All rights reserved. 75
  76. 76. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE Confusion matrix © 2019 Amazon Web Services, Inc. or its Affiliates. All rights reserved. true positives (TP): These are cases in which we predicted yes (they have the disease), and they do have the disease. true negatives (TN): We predicted no, and they don't have the disease. false positives (FP): We predicted yes, but they don't actually have the disease. (Also known as a "Type I error.") false negatives (FN): We predicted no, but they actually do have the disease. (Also known as a "Type II error.")
  77. 77. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE© 2019 Amazon Web Services, Inc. or its Affiliates. All rights reserved. Example: training a linear regression model
  78. 78. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE What is linear regression? 78© 2019 Amazon Web Services, Inc. or its Affiliates. All rights reserved. Input X (size of house) Output Y (Price) f(x) = W*x + b xo xn
  79. 79. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE Loss function Loss function a.k.a objective function 79© 2019 Amazon Web Services, Inc. or its Affiliates. All rights reserved. Input X (size of house) Output Y (Price) f(x) = W*x + b Error (f(Xj) – Yj) j=1 n xo xn Mean Square Error(MSE) Function
  80. 80. © 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved. This work may not be reproduced or redistributed, in whole or in part, without prior written permission from Amazon Web Services, Inc. Commercial copying, lending, or selling is prohibited. Corrections or feedback on the course, please email us at: aws-course- feedback@amazon.com. For all other questions, contact us at: https://aws.amazon.com/contact-us/aws-training/. All trademarks are the property of their owners. Thank you
  81. 81. © 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved. This work may not be reproduced or redistributed, in whole or in part, without prior written permission from Amazon Web Services, Inc. Commercial copying, lending, or selling is prohibited. Corrections or feedback on the course, please email us at: aws-course- feedback@amazon.com. For all other questions, contact us at: https://aws.amazon.com/contact-us/aws-training/. All trademarks are the property of their owners. Amazon SageMaker Build, Train, and Deploy Machine Learning Models Quickly & Easily, at scale
  82. 82. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE Customer-focused 90%+ of our ML roadmap is defined by customers Pace of innovation 200+ new ML launches and major feature updates last year Breadth and depth A wide range of AI and ML services Multi-framework Support for the most popular frameworks Security and analytics Deep set of security with robust encryption and analytics Embedded R&D Customer-centric approach Our Approach to Machine Learning
  83. 83. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE Introduction to Amazon SageMaker 3
  84. 84. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark FRAMEWORKS INTERFACES INFRASTRUCTURE AI Services Broadest and deepest set of capabilities T H E A W S M L S T A C K VISION SPEECH LANGUAGE CHATBOTS FORECASTING RECOMMENDATIONS ML Services ML Frameworks + Infrastructure P O L L Y T R A N S C R I B E T R A N S L A T E C O M P R E H E N D & C O M P R E H E N D M E D I C A L L E X F O R E C A S TR E K O G N I T I O N I M A G E R E K O G N I T I O N V I D E O T E X T R A C T P E R S O N A L I Z E Ground Truth Notebooks Algorithms + Marketplace Reinforcement Learning Training Optimization Deployment HostingAmazon SageMaker F P G A SE C 2 P 3 & P 3 D N E C 2 G 4 E C 2 C 5 I N F E R E N T I AG R E E N G R A S S E L A S T I C I N F E R E N C E D L C O N T A I N E R S & A M I s The picture can't be displayed.
  85. 85. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE Amazon SageMaker: Build, Train, and Deploy ML Models at Scale Collect and prepare training data Choose and optimize your ML algorithm Train and Tune ML Models Set up and manage environments for training Deploy models in production Scale and manage the production environment 1 2 3 5
  86. 86. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE Collect and prepare training data Choose and optimize your ML algorithm Train and Tune ML Models Set up and manage environments for training Deploy models in production Scale and manage the production environment 1 2 3 Amazon SageMaker: Build, Train, and Deploy ML Models at Scale 6
  87. 87. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE Collect and prepare training data Choose and optimize your ML algorithm Train and Tune ML Models Set up and manage environments for training Deploy models in production Scale and manage the production environment Amazon SageMaker: Build, Train, and Deploy ML Models at Scale 7
  88. 88. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE Collect and prepare training data Choose and optimize your ML algorithm Train and Tune ML Models Set up and manage environments for training Deploy models in production Scale and manage the production environment Amazon SageMaker: Build, Train, and Deploy ML Models at Scale 8
  89. 89. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE Collect and prepare training data Choose and optimize your ML algorithm Train and Tune ML Models Set up and manage environments for training Deploy models in production Scale and manage the production environment Amazon SageMaker: Build, Train, and Deploy ML Models at Scale 9
  90. 90. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE Collect and prepare training data Choose and optimize your ML algorithm Train and Tune ML Models Set up and manage environments for training Deploy models in production Scale and manage the production environment Amazon SageMaker: Build, Train, and Deploy ML Models at Scale 10
  91. 91. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE Amazon SageMaker: Build, Train, and Deploy ML Models at Scale 11
  92. 92. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE Machine Learning is Transforming F1 Racing 12
  93. 93. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE Collect and prepare training data Choose and optimize your ML algorithm Train and tune models Set up and manage environments for training Deploy model in production Scale and manage the production environment 1 2 3 1 2 3 Amazon SageMaker: Build, Train, and Deploy ML Models at Scale 13
  94. 94. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE Successful models require high-quality data 14
  95. 95. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE Successful models require high-quality data 15
  96. 96. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE Amazon SageMaker Ground Truth Build highly accurate training datasets and reduce data labeling costs by up to 70% using machine learning 16
  97. 97. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE Raw Data Amazon SageMaker Ground Truth How it works 17
  98. 98. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE Raw Data Human Annotations Amazon SageMaker Ground Truth How it works 18
  99. 99. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE Raw Data Human Annotations Automatic Annotations Training Data Human Annotations Amazon SageMaker Ground Truth How it works 19
  100. 100. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE Raw Data Human Annotations Automatic Annotations Training Data Human Annotations Amazon SageMaker Ground Truth How it works 20
  101. 101. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE Mechanical Turk workers Private labeling workforce Third-party vendors Amazon SageMaker Ground Truth How it works 21
  102. 102. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE Easily integrate human labelers Get accurate results Quickly label training data K E Y F E A T U R E S Automatic labeling via machine learning Ready-made and custom workflows Label management Private and public human workforce Label machine learning training data easily and accurately 22
  103. 103. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE Collect and prepare training data Choose and optimize your ML algorithm Train and tune model Set up and manage environments for training Deploy model in production Scale and manage the production environment Amazon SageMaker Ground Truth Amazon SageMaker: Build, Train, and Deploy ML Models at Scale 23
  104. 104. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE AWS Marketplace for Machine Learning Over 200 algorithms and models that can be deployed directly to Amazon SageMaker 24
  105. 105. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE AWS Marketplace for Machine Learning ML algorithms and models available instantly KEY FEATURES Automatic labeling via machine learning IP protection Automated billing and metering S E L L E R S Broad selection of paid, free, and open-source algorithms and models Data protection Discoverable on your AWS bill B U Y E R S 25
  106. 106. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE Over 200 algorithms and models Natural Language Processing Grammar & Parsing Text OCR Computer Vision Named Entity Recognition Video Classification Speech Recognition Text-to-Speech Speaker Identification Text Classification 3D Images Anomaly Detection Text Generation Object Detection Regression Text Clustering Handwriting Recognition Ranking A V A I L A B L E A L G O R I T H M S & M O D E L S S E L E C T E D V E N D O R S 26
  107. 107. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE Collect and prepare training data Choose and optimize your ML algorithm Train and tune models Set up and manage environments for training Deploy model in production Scale and manage the production environment Amazon EC2 P3 Instances Amazon SageMaker Ground Truth Amazon Elastic Inference AWS Marketplace for Machine Learning Amazon SageMaker: Build, Train, and Deploy ML Models at Scale 27
  108. 108. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE Optimization is extremely complex 28
  109. 109. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE Amazon SageMaker Neo Train once, run anywhere with 2x the performance 29
  110. 110. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE Amazon SageMaker Neo: Train once, run anywhere Neo 30
  111. 111. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE Amazon SageMaker Neo Train once, run anywhere with 2x the performance Automatic optimization Broad framework support K E Y F E A T U R E S Get accuracy and performance Open-source device runtime and compiler, 1/10th the size of original frameworks Broad hardware support 31
  112. 112. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE Machine Learning is a highly collaborative process every step of the way 32
  113. 113. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE SageMaker Workflows Experiment Management Organize, track and evaluate model training experiments with SageMaker Search 33
  114. 114. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE SageMaker Workflows Experiment Management Automation Organize, track, and evaluate model training experiments with SageMaker Search Use AWS Step Functions to automate end-to-end workflows Integrate with Apache Airflow 34
  115. 115. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE SageMaker Workflows Experiment Management Automation Collaboration Organize, track, and evaluate model training experiments with SageMaker Search Use AWS Step Functions to automate end-to-end workflows Integrate with Apache Airflow Link GitHub, AWS CodeCommit and self-hosted Git repositories to notebooks Clone public and private repositories Secure information with IAM, LDAP and AWS Secrets Manager 35
  116. 116. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE So what’s next for machine learning? How do you teach machine learning models to make decisions when there is no training data? 36
  117. 117. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE Introducing Reinforcement learning (RL) Reinforcement learning (RL) Supervised learning (ASR, computer vision) Unsupervised learning (Anomaly detection, identifying text topics) Amount of labeled training data required Complexityofdecisions 37
  118. 118. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE What is a RL environment? Representation of the real world Programmed to represent real- world conditions Enables interaction with user or a computer program Dynamic and updates itself based on the interactions and programmed behavior 38
  119. 119. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE This makes RL applicable in many domains and not just gaming Robotics Industrial control HVAC Autonomous vehicles NLP Operations Finance Resource allocation Advertising Online content delivery 39
  120. 120. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE Reinforcement Learning Achieve outcomes, not decisions Robotics Industrial controls Natural language systems Games 40
  121. 121. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE How does RL work? Simulation environment Scoring function RL algorithm 41
  122. 122. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE How does RL work? Simulation environment Scoring function RL algorithm Extremely complex Expensive Effectively out of reach 42
  123. 123. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE Amazon SageMaker RL New machine learning capabilities in Amazon SageMaker to build, train, and deploy with reinforcement learning 43
  124. 124. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE Amazon SageMaker RL Reinforcement learning for every developer and data scientist Broad support for frameworks Broad support for simulation environments 2D & 3D physics environments and OpenGym support Support Amazon Sumerian, AWS RoboMaker and the open source Robotics Operating System (ROS) project Fully managed Example notebooks and tutorials K E Y F E A T U R E S 44
  125. 125. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE The challenges of inference in production One size does not fit allLow utilization and high costs How do we optimize resources and reduce costs? 45
  126. 126. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE Reduce Deep Learning inference costs by up to 75% Amazon Elastic Inference 46
  127. 127. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE Amazon Elastic Inference Lower inference costs Match capacity to demand Available between 1 to 32 TFLOPS per accelerator KEY FEATURES Integrated with Amazon EC2 and Amazon SageMaker Support for TensorFlow, Apache MXNet, and ONNX with PyTorch coming soon Single and mixed-precision operations Reduce deep learning inference costs up to 75% 47
  128. 128. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE How does Elastic Inference work with SageMaker? SageMaker Notebooks Prototype deployments with Notebooks in local mode SageMaker Hosted Endpoints Scale endpoints with low-cost Elastic Inference Acceleration 48
  129. 129. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE Model Support Amazon EI enabled TensorFlow Serving and Apache MXNet ONNX Amazon EI enabled TensorFlow Serving Amazon EI enabled Apache MXNet Applied using Apache MXNet Auto discovery of accelerators IAM-based authentication Available via: the AWS Deep Learning AMIs, for download via S3 and automatically though SageMaker 49
  130. 130. AMAZON CONFIDENTIAL/DO NOT DISTRIBUTE Recap: Amazon SageMaker Collect and prepare training data Choose and optimize your ML algorithm Train and tune models Set up and manage environments for training Deploy models in production Scale and manage the production environment Amazon EC2 P3 Instances Amazon SageMaker RL Amazon SageMaker Ground Truth Amazon Elastic Inference AWS Marketplace for Machine Learning Amazon SageMaker Neo Single API call for deployment 50
  131. 131. © 2018, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark FRAMEWORKS INTERFACES INFRASTRUCTURE AI Services Broadest and deepest set of capabilities T H E A W S M L S T A C K VISION SPEECH LANGUAGE CHATBOTS FORECASTING RECOMMENDATIONS ML Services ML Frameworks + Infrastructure P O L L Y T R A N S C R I B E T R A N S L A T E C O M P R E H E N D & C O M P R E H E N D M E D I C A L L E X F O R E C A S TR E K O G N I T I O N I M A G E R E K O G N I T I O N V I D E O T E X T R A C T P E R S O N A L I Z E Ground Truth Notebooks Algorithms + Marketplace Reinforcement Learning Training Optimization Deployment HostingAmazon SageMaker F P G A SE C 2 P 3 & P 3 D N E C 2 G 4 E C 2 C 5 I N F E R E N T I AG R E E N G R A S S E L A S T I C I N F E R E N C E D L C O N T A I N E R S & A M I s The picture can't be displayed.
  132. 132. © 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved. This work may not be reproduced or redistributed, in whole or in part, without prior written permission from Amazon Web Services, Inc. Commercial copying, lending, or selling is prohibited. Corrections or feedback on the course, please email us at: aws-course- feedback@amazon.com. For all other questions, contact us at: https://aws.amazon.com/contact-us/aws-training/. All trademarks are the property of their owners. Resources - SageMaker Product Page - SageMaker Console - Ground Truth Product Page - Neo Product Page - SageMaker 10-Minute Tutorial - SageMaker Related Blogs
  133. 133. © 2019 Amazon Web Services, Inc. or its affiliates. All rights reserved. This work may not be reproduced or redistributed, in whole or in part, without prior written permission from Amazon Web Services, Inc. Commercial copying, lending, or selling is prohibited. Corrections or feedback on the course, please email us at: aws-course- feedback@amazon.com. For all other questions, contact us at: https://aws.amazon.com/contact-us/aws-training/. All trademarks are the property of their owners. Thank you

×