Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Amazon SageMaker Ground Truth: Build High-Quality and Accurate ML Training Datasets

439 views

Published on

Successful machine learning models are built on high-quality training datasets. Labeling raw data to get accurate training datasets involves a lot of time and effort because sophisticated models can require thousands of labeled examples to learn from, before they can produce good results. Typically, the task of labeling is distributed across a large number of humans, adding significant overhead and cost. Join us as we introduce Amazon SageMaker Ground Truth, a new service that provides an effective solution to reduce this cost and complexity using a machine learning technique called active learning. Active learning reduces the time and manual effort required to do data labeling, by continuously training machine learning algorithms based on labels from humans. By iterating through ambiguous data points, Ground Truth improves the ability to automatically label data resulting in high-quality training datasets.

Level: 300

Speaker: Kris Skrinak - Partner Solutions Architect, ML Global Lead, AWS

Amazon SageMaker Ground Truth: Build High-Quality and Accurate ML Training Datasets

  1. 1. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Build highly accurate training datasets and reduce data labeling costs by up to 70% using machine learning
  2. 2. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. How it works Raw Data
  3. 3. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. How it works
  4. 4. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. How it works
  5. 5. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. How it works
  6. 6. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. How it works
  7. 7. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Creating training data
  8. 8. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon SageMaker ground truth Label machine learning training data easily and accurately
  9. 9. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Driving Ease of Use
  10. 10. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Dealing with documents is demanding How can we make it easy?
  11. 11. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. the 137 protein structures The performance of the various clusterings was evalu- MethOd Num' C'USte'S Rand mdex ated using two types of measures. The first is the average TM~score 8 89.7% silhouette width itself, which is a measure of the clus- ppm 9 39,396 ter compactness and separation. In general, clustering is 305C 9 895% based on the assumption that the underlying data form compact clusters of similar characteristics. Larger aver- R50 7 92.096 age Silhouette Width means that the result of a clustering Traditional OCR only provides a “bag of letters”
  12. 12. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. T E X T method. Then, the proteins were clustered using the k- medoids method with the optimal number of clusters. The performance of the various clusterings was evalu- ated using two types of measures. The first is the average silhouette width itself, which is a measure of the clus- ter compactness and separation. In general, clustering is based on the assumption that the underlying data form compact clusters of similar characteristics. Larger aver- age silhouette width means that the result of a clustering algorithm consists of compact clusters which are well sep- arated from each other, i.e. probably close to the actual data distribution. A small average silhouette width means e.g. that one of the clusters discovered by the clustering algorithm could be separated in two clusters, or that some Search index Amazon Textract: An organized filing cabinet of document content
  13. 13. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Method Num. clusters Rand index TM-score FPFH 3DSC RSD VFH Combined silhouette weights Combined equal weights 8 9 9 7 8 7 7 89.7% 89.3% 89.5% 92.0% 85.3% 92.2% 90.2% Aurora Amazon Textract: An organized filing cabinet of document content
  14. 14. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Graceland, Memphis Presley, Elvis Aaron TCB Limited 12-12-1234 TN 01 08 1935 X 901 987-6543 3765 Elvis Presley Blvd. 38116 X RCA Records Rock n Roll Health X Presley, Elvis Aaron Government forms (e.g. FDA new drug application, financial disclosure form, incident reporting) Tax forms (US – e.g. W2, 1099-MISC, 990, 1040; UK – e.g. P45; Canada – e.g. T4, T5) Amazon Textract: automatic document processing without data entry, or writing rules
  15. 15. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Presley, Elvis AaronN A M E Graceland, Memphis, TNA D D R E S S 12-12-1234I D TCB LimitedC O M P A N Y Graceland, Memphis Presley TCB Limited 12-12-1234 TN 901 987-6543 3765 Elvis Presley Blvd. 38116 Elvis Elvis.Presley@yahoo.com Amazon Textract: automatic document processing without data entry, or writing rules
  16. 16. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Textract Extract text and data from virtually any document Eliminate manual effort Lower document processing costs K E Y F E AT U R E S Extract data quickly and accurately Optical Character Recognition (OCR) Key-value pair detection Adjustable confidence thresholds Table detection Bounding box coordinates No ML experience required
  17. 17. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. We have many AI services with pre-trained models… but there is no master algorithm for recommendations and personalization
  18. 18. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Music Film Products Content Tracks Artists Albums Actors Directors Genres Pricing Category Promotions Themes Demographics Breaking News
  19. 19. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Popularity Trap Only showing most popular items limits the individuality of the recommendation Cold Starts Relevant recommendations must be able to be surfaced even for customers with limited history Scale Resonant recommendations need to scale across thousands of products and customers Real-Time Personalization must work at low latency, and be responsive to the changing intent of a customer The image part with relationship ID rId5 was not found in the file.
  20. 20. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Personalize Real-time personalization and recommendation service, based on the same technology used at Amazon.com. No ML experience required.
  21. 21. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Activity stream Views, signups, conversion, etc. Inventory Videos, products, articles, etc. Customized Personalization APIDemographics (optional) Name, age, location, etc. 1. Load data 2. Inspect data 3. Identify features 4. Select algorithms 5. Select hyperparameters 6. Train models 7. Optimize models 8. Build feature store 9. Deploy and host models 10. Create real-time caches Amazon Personalize
  22. 22. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Real-time Works with almost any product or content K E Y F E AT U R E S Responsive to changes in intent Automated machine learning Bring existing algorithms from Amazon SageMaker Deliver high quality recommendations Deep learning enabled algorithms Easy to Use The image part with relationship ID rId2 was not found in the file.
  23. 23. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Customers often ask… “How can we tap into Amazon’s experience in machine learning?”
  24. 24. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  25. 25. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Accuracy is the most important factor in forecasting A recent study of 15 US companies revealed that 15% improvement in accuracy leads to 3% improvement in pre-tax profit* Under forecasting leads to lost opportunity Over-forecasting leads to wasted resources *http://demand-planning.com/2018/07/12/how-much-does-forecasting-software-cost/*
  26. 26. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. External Factors Weather, holidays, events and trends impact demand, and should be integrated into forecasts No History Available New products or processes with no prior data are very difficult to forecast Additional Variables Traditional models rarely take into consideration additional meta-data because it is hard to obtain Spikey or Intermittent Data Real-world data often exhibits irregular patterns which causes traditional models to fail
  27. 27. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Sales Forecast Sales Time
  28. 28. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Sales Forecast Sales Time
  29. 29. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Sales Forecast Sales Time
  30. 30. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Forecast Accurate time-series forecasting service, based on the same technology used at Amazon.com. No ML Experience Required
  31. 31. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Fulfillment by Amazon (FBA) observed a 13.9% increase after switching from traditional methods to deep learning based ones. Today, FBA’s forecasting is used by over 2M sellers to stock Amazon’s warehouses with optimal inventory levels and fulfill demand
  32. 32. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Historical data Sales, inventory, pricing, etc. Related data Weather, competitive promotions, etc. 1. Load data 2. Inspect data 3. Identify features 4. Select algorithms 5. Select hyperparameters 6. Train models 7. Optimize models 8. Deploy and host models Amazon Forecast Customized Forecasting API Private
  33. 33. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Predicts spikes accurately! Generates forecasts for new items Learns relationships between multiple related time-series Incorporates external data (holidays, promotions, and so on)
  34. 34. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Custom models – no data sharing Forecast any time-series Visualize and override forecasts Easily export to Oracle, SAP
  35. 35. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. • Amazon Forecast is applicable across multiple domains • You can set your domain using the console or via the API • You upload datasets with different schemas based on the domain
  36. 36. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  37. 37. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. There are three types of datasets in Amazon Forecast: Item metadataTarget time-series Related time-series Related time- series such as price, web-hits etc. Historic time- series data of items to forecast Attributes of the item such as category, genre and brand
  38. 38. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Evaluate accuracy metrics and deploy to production. Visualize results in the console, or export forecasts. Import data from Amazon S3 buckets. Automatically select algorithms through the API or on the console,, or choose your own.
  39. 39. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Easy to use, through the console and API Works with any historical time-series K E Y F E AT U R E S More accurate forecasts that integrate external data Consider multiple time-series at once Automatic machine learning Visualize forecasts in the console & import results into business apps Evaluate model accuracy through the console Schedule forecasts and model retraining Bring existing algorithms from Amazon SageMaker
  40. 40. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Scaling machine learning
  41. 41. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Last year, ML was still too complicated 1 2 3 1 2 3
  42. 42. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon SageMaker: build, train, and deploy ML 1 2 3 1 2 3
  43. 43. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon SageMaker: build, train, and deploy ML 1 2 3 1 2 3
  44. 44. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon SageMaker: build, train, and deploy ML 1 2 3 1 2 3
  45. 45. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon SageMaker: build, train, and deploy ML 1 2 3 1 2 3
  46. 46. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon SageMaker: build, train, and deploy ML 1 2 3 1 2 3
  47. 47. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon SageMaker: build, train, and deploy ML 1 2 3 1 2 3
  48. 48. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. SageMaker Fabric – 4 Discrete Components
  49. 49. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Jupyter Notebook interface for exploration Built-in, high performance algorithms BUILD One-click training TRAIN Automatic Model Tuning (Hyperparamete r Tuning) Amazon SageMaker components work together
  50. 50. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Jupyter Notebook interface for exploration Built-in, high performance algorithms BUILD One-click training TRAIN Automatic Model Tuning (Hyperparamete r Tuning) Amazon SageMaker components work together Fully managed hosting with auto- scaling One-click deployment DEPLOY EXECUTE Batch Transform
  51. 51. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. JupyterLab • JupyterLab provides a high level of integration between notebooks, documents, and activities: • Drag-and-drop to reorder notebook cells and copy them between notebooks. • Run code blocks interactively from text files (.py, .R, .md, .tex, etc.). • Link a code console to a notebook kernel to explore code interactively without cluttering up the notebook with temporary scratch work. • Edit popular file formats with live preview, such as Markdown, JSON, CSV, Vega, VegaLite, and more. • JupyterLab is built on top of an extension system
  52. 52. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. JupyterLab Extensions
  53. 53. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Git Repositories
  54. 54. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS Marketplace for machine learning ML algorithms and models available instantly
  55. 55. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Over 150 models and algorithms available Natural Language Processing Grammar & Parsing Text OCR Computer Vision Named Entity Recognition Video Classification Speech Recognition Text-to-Speech Speaker Identification Text Classification 3D Images Anomaly Detection Text Generation Object Detection Regression Text Clustering Handwriting Recognition Ranking S O M E O F T H E A V A I L A B L E A L G O R I T H M S A N D M O D E L S S E L E C T E D V E N D O R S
  56. 56. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Optimization is extremely complex
  57. 57. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. A deep learning model compiler that lets customers train models once, and run them anywhere with up to 2X improvement in performance
  58. 58. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Train once, run anywhere
  59. 59. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon SageMaker Neo Train once, run anywhere with 2x the performance K E Y F E AT U R E S Compiler & run-time are open source 1/10th the size of original models
  60. 60. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Recap: updates to Amazon SageMaker Choose your ML algorithm Optimize your ML algorithm Train and tune model (trial and error) Set up and manage environments for training Deploy model in production Scale and manage the production environment
  61. 61. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. M L F R A M E W O R K S & I N F R A S T R U C T U R E A I S E R V I C E S R E K O G N I T I O N I M A G E P O L L Y T R A N S C R I B E T R A N S L A T E C O M P R E H E N D L E XR E K O G N I T I O N V I D E O Vision Speech Language Chatbots A M A Z O N S A G E M A K E R B U I L D T R A I N F O R E C A S T Forecasting T E X T R A C T P E R S O N A L I Z E Recommendations D E P L O Y Pre-built algorithms & notebooks Data labeling (G R O U N D T R U T H ) One-click model training & tuning Optimization (N E O ) One-click deployment & hosting M L S E R V I C E S F r a m e w o r k s I n t e r f a c e s I n f r a s t r u c t u r e E C 2 P 3 & P 3 N E C 2 C 5 F P G A s G R E E N G R A S S E L A S T I C I N F E R E N C E Reinforcement learningAlgorithms & models ( A W S M A R K E T P L A C E F O R M A C H I N E L E A R N I N G )
  62. 62. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. What’s next for machine learning?
  63. 63. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Supervised learning Unsupervised learning Types of Machine LearningSOPHISTICATIONOFMLMODELS AMOUNT OF TRAINING DATA REQUIRED
  64. 64. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Types of Machine Learning AMOUNT OF TRAINING DATA REQUIRED Supervised learning Unsupervised learning SOPHISTICATIONOFMLMODELS
  65. 65. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Types of Machine Learning Reinforcement learning (RL) Supervised learning Unsupervised learning AMOUNT OF TRAINING DATA REQUIRED SOPHISTICATIONOFMLMODELS
  66. 66. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. How does RL work? USE CASES
  67. 67. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. New machine learning capabilities in Amazon SageMaker to build, train and deploy with reinforcement learning
  68. 68. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon SageMaker RL Reinforcement learning for every developer and data scientist Broad support for frameworks Broad support for simulation environments including SimuLink and MatLab K E Y F E AT U R E S TensorFlow, Apache MXNet, Intel Coach, and Ray RL support 2D & 3D physics environments and OpenAI Gym support Supports Amazon Sumerian and Amazon RoboMaker Fully managed Example notebooks and tutorials
  69. 69. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. How can we get developers rolling with reinforcement learning (literally?
  70. 70. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Fully autonomous 1/18th scale race car, driven by reinforcement learning
  71. 71. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Introducing AWS DeepRacer HD video camera Dual-core Intel processorFour-wheel drive Dual power for compute and drive Accelerometer Gyroscope © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
  72. 72. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS DeepRacer Fully autonomous 1/18th scale race car, driven by reinforcement learning
  73. 73. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Introducing The world’s first global, autonomous racing league, open to anyone
  74. 74. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. AWS DeepRacer League Competitive racing league for AWS DeepRacer Compete virtually online Train models with RL Race in trials Fastest times advance Top 10 times score Final at re:Invent
  75. 75. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Other ways to getting started…
  76. 76. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Machine Learning University Uses the same materials used to train Amazon developers Foundational knowledge with real-world application Structured courses and specialist certification
  77. 77. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon ML Solutions Lab Brainstorming Custom modeling Training Work side-by-side with Amazon experts
  78. 78. Thank you! © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Kris Skrinak PSA, Global ML Segment Lead Amazon Web Services skrinak@amazon.com

×