Mobileye adopted Amazon SageMaker to accelerate its deep learning model development, reducing time from months to under a week. Pipe Mode enabled training on Mobileye's large datasets without copying data to instances. Challenges like data format conversion and shuffling were addressed using SageMaker features and TensorFlow APIs. Adopting SageMaker provided Mobileye unlimited compute and helped simplify and scale its neural network training.
Starting your AI/ML project right (May 2020)Julien SIMON
In this talk, we’ll see how you can put your AI/ML project on the right track from the get-go. Applying common sense and proven best practices, we’ll discuss skills, tools, methods, and more. We’ll also look at several real-life projects built by AWS customers in different industries and startups.
Starting your AI/ML project right (May 2020)Julien SIMON
In this talk, we’ll see how you can put your AI/ML project on the right track from the get-go. Applying common sense and proven best practices, we’ll discuss skills, tools, methods, and more. We’ll also look at several real-life projects built by AWS customers in different industries and startups.
Amazon SageMaker is a fully-managed platform that lets developers and data scientists build and scale machine learning solutions. First, we'll show you how SageMaker Ground Truth helps you label large training datasets. Then, using Jupyter notebooks, we'll show you how to build, train and deploy models using built-in algorithms and frameworks (TensorFlow, Apache MXNet, etc). Finally, we'll show you how to use 3rd-party models from the AWS marketplace.
Optimize your Machine Learning Workloads on AWS (July 2019)Julien SIMON
Talk at Floor 28, Tel Aviv.
Infrastructure, tips to speed up training, hyperparameter optimization, model compilation, Amazon SageMaker Neo, cost optimization, Amazon Elastic Inference
Recommendation is one of the most popular applications in machine learning (ML). In this workshop, we’ll show you how to build a movie recommendation model based on factorization machines — one of the built-in algorithms of Amazon SageMaker — and the popular MovieLens dataset.
In this session we show how engineers, analysts and scientists can develop, train, tune and deploy machine learning models via Amazon SageMaker, a docker-based, framework-agnostic orchestrator dedicated to machine learning with strong primitives for automation, monitoring and deployment. We will cover the following topics:
- Amazon SageMaker architecture and key functionality
- Scaling and deploying open source frameworks on Amazon SageMaker (sklearn, MXNet, Keras, TensorFlow, pyTorch, R)
- Amazon SageMaker built-in Algorithm library: 18 state-of-the art algorithm covering broad use-cases, from computer vision to recommender systems
- ML model deployment
Amazon SageMaker is a fully managed Machine Learning service which facilitates seamless adoption of #MachineLearning across various industries! Jayesh is walking us through details of SageMaker with demo in this talk!
Machine Learning Inference at the EdgeJulien SIMON
Machine Learning works by using powerful algorithms to discover patterns in data and construct complex mathematical models using these patterns. Once the model is built, you perform inference by applying new data to the trained model to make predictions for your application. Building and training ML models require massive computing resources so it is a natural fit for the cloud. But, inference takes a lot less computing power and is typically done in real-time when new data is available, so getting inference results with very low latency is important to making sure your applications can respond quickly to local events. AWS Greengrass ML inference gives you the best of both worlds. You use ML models that are built and trained in the cloud and you deploy and run ML inference locally on connected devices. For example, autonomous cars need to identify road signs in real time, drones need to recognize objects with or without network connectivity.
Amazon SageMaker is a fully-managed platform that lets developers and data scientists build and scale machine learning solutions. First, we'll show you how SageMaker Ground Truth helps you label large training datasets. Then, using Jupyter notebooks, we'll show you how to build, train and deploy models using built-in algorithms and frameworks (TensorFlow, Apache MXNet, etc). Finally, we'll show you how to use 3rd-party models from the AWS marketplace.
Optimize your Machine Learning Workloads on AWS (July 2019)Julien SIMON
Talk at Floor 28, Tel Aviv.
Infrastructure, tips to speed up training, hyperparameter optimization, model compilation, Amazon SageMaker Neo, cost optimization, Amazon Elastic Inference
Recommendation is one of the most popular applications in machine learning (ML). In this workshop, we’ll show you how to build a movie recommendation model based on factorization machines — one of the built-in algorithms of Amazon SageMaker — and the popular MovieLens dataset.
In this session we show how engineers, analysts and scientists can develop, train, tune and deploy machine learning models via Amazon SageMaker, a docker-based, framework-agnostic orchestrator dedicated to machine learning with strong primitives for automation, monitoring and deployment. We will cover the following topics:
- Amazon SageMaker architecture and key functionality
- Scaling and deploying open source frameworks on Amazon SageMaker (sklearn, MXNet, Keras, TensorFlow, pyTorch, R)
- Amazon SageMaker built-in Algorithm library: 18 state-of-the art algorithm covering broad use-cases, from computer vision to recommender systems
- ML model deployment
Amazon SageMaker is a fully managed Machine Learning service which facilitates seamless adoption of #MachineLearning across various industries! Jayesh is walking us through details of SageMaker with demo in this talk!
Machine Learning Inference at the EdgeJulien SIMON
Machine Learning works by using powerful algorithms to discover patterns in data and construct complex mathematical models using these patterns. Once the model is built, you perform inference by applying new data to the trained model to make predictions for your application. Building and training ML models require massive computing resources so it is a natural fit for the cloud. But, inference takes a lot less computing power and is typically done in real-time when new data is available, so getting inference results with very low latency is important to making sure your applications can respond quickly to local events. AWS Greengrass ML inference gives you the best of both worlds. You use ML models that are built and trained in the cloud and you deploy and run ML inference locally on connected devices. For example, autonomous cars need to identify road signs in real time, drones need to recognize objects with or without network connectivity.
AWS Business Essentials helps IT business decision makers understand the concepts and advantages of cloud computing and how a cloud strategy can help you meet your business objectives.
AWS Business Essentials Day 2.2 (full deck)
Module 1: Getting Started with the Cloud
Module 2: Leveraging AWS for Competitive Advantages
Module 3: Cloud Economics
Module 4: Security and Compliance
Module 5: Migrating to the Cloud
- Overview of a use case - Sentiment analysis
- Introduction - Using Jupyter Notebook & AWS SageMaker
- Setup New Project
- Setup and Run the Build CI/CD Pipeline
- Setup the Release Pipeline
- Test Build and Release Pipelines
- Testing the deployed solution
- Examining deployed model performance
Cost is often the conversation starter when customers think about moving to the cloud. AWS helps lower costs for customers through its “pay only for what you use” pricing model, frequent price drops, and pricing model choice to support variable & stable workloads. In this session, you will learn about the financial considerations of owning and operating a traditional data center or managed hosting provider versus utilizing AWS. We will detail our TCO methodology and showcase cost comparisons for some common customer use-cases. We’ll also cover a few AWS cost optimization areas, including Spot and Reserved Instances, EC2 Auto Scaling, and consolidated billing.
Presenter:
Amit Sharma, Solution Architect, Amazon Internet Services
Krishnenjit Roy, Director IT Operations, Freshdesk
Going Cloud Native with IBM Cloud and NetflixOSS for Dev@Pulseaspyker
Dev@Pulse 2014 Lightning Talk.
Focused on how to use the IBM Cloud and NetflixOSS for high availability/automatic recovery, elastic and web scale, and high velocity continuous delivery. The talk also includes a live demo of chaos testing (Chaos Gorilla specifically) where the application was shown to have enough high availability to survive an entire datacenter / availability zone outage.
AWS Summit 2013 | India - Running High Churn Development & Test Environments,...Amazon Web Services
The flexible and pay-as-you-go nature of AWS means that developers can spin up compute resources quickly and shut them down when not required. Learn about rapid deployment of applications to AWS as part of your development and testing cycle. Development and testing are a resource hungry function that requires numerous environments and the AWS Cloud allows you create these environments quickly. Hear about real-world examples of our existing customers that have benefited from using AWS for their development and testing.
Christian's part of the AWS re:Invent 2015 talk shared with Sajee Mathew - ARC304 - Designing for SaaS: Next Generation Software Delivery Models on AWS. Full video of the 60 minute presentation: https://www.youtube.com/watch?v=d16aUztH9hk&list=PLhr1KZpdzukdRxs_pGJm-qSy5LayL6W_Y
Application Delivery Patterns for Developers - Technical 401Amazon Web Services
Every developer has gone through the frustration of creating new features, fixing bugs, or refactoring beautiful code, and then wait for it to reach the promise land of production. Come and learn how to get your changes in the hands of your customers with more speed, reliability, security and quality.
We will dive deep into architectures for continuous delivery pipelines, apply lean principles, and build intelligence into your pipeline.
Speaker: Shiva Narayanaswamy, Solutions Architect, Amazon Web Services
Featured Customer - REA Group
Cloud Based Cognitive Learning & IT Project Performance Platform (CLIPP Platf...Ed Sattar
How to use Artificial Intelligence to personalize training and turn it into high performance learning experience.
Architecture and Technology overview of the Cognitive Learning Platform.
The objective is how to use artificial intelligence for cognitive content curation, personalize learning and develop detailed IT project requirements using artificial intelligence
AWS 201 Webinar Series - Rightsizing and Cost Optimizing your DeploymentAmazon Web Services
Leveraging the AWS Cloud can help you further lower your overall IT costs and avoid fixed, upfront IT investments. Learning how to right-size your environments can help you to go from capacity guessing to meeting QoE targets for your customers. The session will also cover best practices on how to Architect for Cost from real world customer use cases and ultimately how the AWS Cloud can help you increase revenue by focusing on Innovation and Return on Agility.
Key takeaways
- Replace up-front capital expenses with low variable costs
- Outsource undifferentiated IT tasks to useful services
- Evaluate the total Cost of (Non) Ownership
- Build Cost-aware architectures
- AWS features that help you reduce your spend
- Different purchasing options available with AWS
Who should attend
- Technical Users: Developers, engineers, system administrators and architects
- Decision Makers: IT Managers, directors and business leaders
An introduction to computer vision with Hugging FaceJulien SIMON
In this code-level talk, Julien will show you how to quickly build and deploy computer vision applications based on Transformer models. Along the way, you'll learn about the portfolio of open source and commercial Hugging Face solutions, and how they can help you deliver high-quality solutions faster than ever before.
Building Machine Learning Inference Pipelines at Scale (July 2019)Julien SIMON
Talk at OSCON, Portland, 18/07/2019
Real-life Machine Learning applications require more than a single model. Data may need pre-processing: normalization, feature engineering, dimensionality reduction, etc. Predictions may need post-processing: filtering, sorting, combining, etc.
Our goal: build scalable ML pipelines with open source (Spark, Scikit-learn, XGBoost) and managed services (Amazon EMR, AWS Glue, Amazon SageMaker)
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
4. TensorFlow
https://www.tensorflow.org
• Main API in Python, with support for Javascript, Java, C++
• TensorFlow 1.x: symbolic execution
• ‘Define then run’: build a graph, optimize it, feed data, and compute
• Low-level API: variables, placeholders, tensor operations
• High-level API: tf.estimator.*
• Keras library: Sequential and Functional API, predefined layers
• TensorFlow 2.0: imperative execution (aka eager execution)
• ‘Define by run’: normal Python code, similar to numpy
• Run it, inspect it, debug it
• Keras is the preferred API
5. AWS: The platform of choice for TensorFlow
https://aws.amazon.com/tensorflow/
85% of all
TensorFlow workloads
in the cloud run on
AWS
89% of all deep
learning workloads in
the cloud run on AWS
6. TensorFlow: a first-class citizen on Amazon SageMaker
• Built-in TensorFlow containers for training and prediction
• Code available on Github: https://github.com/aws/sagemaker-tensorflow-containers
• Build it, run it on your own machine, customize it, etc.
• Versions : 1.4.1 → 1.15 (2.0 coming soon)
• Not just TensorFlow
• Standard tools: TensorBoard, TensorFlow Serving
• SageMaker features: Local Mode, Script Mode, Model Tuning, Spot Training, Pipe Mode,
Amazon EFS & Amazon FSx for Lustre, Amazon Elastic Inference, etc.
• Performance optimizations: GPUs and CPUs (AWS, Intel MKL-DNN library)
• Distributed training: Parameter Server and Horovod
7. Amazon SageMaker
re:Invent 2019 announcements
SageMaker Studio
SageMaker Notebooks
(preview)
SageMaker Debugger
SageMaker Experiments
SageMaker
Model Monitor
SageMaker Autopilot
9. Making Amazon SageMaker work for you
• Story of how we adopted Amazon SageMaker
• Ways in which Amazon SageMaker accelerated our
development process
• Challenges we faced and how we overcame them
Spoiler:
Adopting Amazon SageMaker enabled us to reduce our
development by up to 10X (from several months to under a week)
11. A bit about Mobileye
Goal: use computer vision-based technologies to
• revolutionize the transportation industry
• make roads safer
• and save lives
Acquired by Intel in 2017 for $15.3 billion
Founded in 1999 by Prof. Amnon Shashua and
Mr. Ziv Aviram
12. A bit about Mobileye
We develop a range of software products,
deployed on a proprietary family of computer chips
named EyeQ
Leading supplier of software that enables advanced
driver-assistance systems (ADAS)—deployed in over
40 million vehicles. (Includes adaptive cruise
control, collision avoidance, lane departure warning,
…)
Over 25 automaker partners,
including most of the world’s largest
13. Some of our technologies
Rely on monocular camera perception
• Reduces cost
• Other sensors can be used for redundancy when
working on high level autonomous driving
14. Some of our Sensing Technologies
Use deep neural networks (DNNs) to
detect:
Road users – including vehicles and pedestrians
Road semantics – including traffic lights, traffic
signs, on-road arrows, stop-lines, and crosswalks
Road boundaries – any delimiter of the drivable
area, its 3D structure and semantics,
including curbs, cones and debris
Road geometry – driving paths and surface profile,
including speed bumps and roadside ditches
17. From ADAS to autonomous
The key to full autonomy relies on three technological pillars
18. From ADAS to autonomous
The key to full autonomy relies on three technological pillars
19. From ADAS to autonomous
The key to full autonomy relies on three technological pillars
Localize the vehicle on a map of the surrounding environment
• Up to <10cm precision
20. From ADAS to autonomous
The key to full autonomy relies on three technological pillars
Sensing and Mapping
21.
22.
23. From ADAS to autonomous
Learn more:
• From Mobileye VP Tal Babaioff
• AUT307 - Navigating the winding road toward driverless mobility
• Online https://www.mobileye.com
25. DNN training challenge
• Variety of different DNN architectures
• Mountains of data: a typical model may train on up to 200TB of data
• Simplified development cycle:
EyeQ
26. DNN training challenge
• Variety of different DNN architectures
• Mountains of data: a typical model may train on up to 200TB of data
• Simplified development cycle:
EyeQ
27. DNN training challenge
• Variety of different DNN architectures
• Mountains of data: a typical model may train on up to 200TB of data
• Simplified development cycle:
EyeQ
28. DNN training challenge
• Variety of different DNN architectures
• Mountains of data: a typical model may train on up to 200TB of data
• Simplified development cycle:
EyeQ
• Historically performed on premise
29. Training on premises
• Obvious drawbacks to training on premises
• Limit to number of GPU instances
• Limitations to data capacity
• Challenge of staying up to date with latest HW and SW
• Any alternative must comply with our development pipeline,
and must keep our IP safe
• Enter Amazon SageMaker …
30. What I like about Amazon SageMaker
• Unconstrained capacity – means I can spin up as many training
sessions as I need
• Instance type variety
• Multi-generation CPU and GPU support
• Up to date with latest HW and SW (tuned to maximize efficiency)
• Distributed training support
• Learning curve
• Well documented with many samples
• Secure – means data security team will let me use it
• Pipe Mode support
32. What is Amazon SageMaker Pipe Mode?
• Enables streaming training data from Amazon S3 to training instances
• Instead of copying data to each training device
33. Benefits of Pipe Mode
• Enables streaming training
data from Amazon S3 to
training instances
• Removes limitation on dataset size
• No need to have dedicated (and costly)
storage for each training instance or to
download data to each instance, and no
delay to training start
• Enables decoupling of data
from training instances
• Multiple training instances can all pull
from the same dataset in S3
34. Pipe Mode with TensorFlow
• Lest you should fear the need to have to manage the data stream
on your own…
• Amazon SageMaker wraps the pipe with an implementation of the
tf.data.Datasets API
• Complete with all the standard dataset functions
• Ready to be fed directly into your model
35. Setting up Pipe Mode
from sagemaker.tensorflow import TensorFlow
tensorflow = TensorFlow(
entry_point='pipemode.py',
input_mode='Pipe’, …)
pipes = {'train':'s3://sagemaker-path-to-data’}
tensorflow.fit(pipes)
36. PipeModeDataset initialization
def parse(record):
feature = {'label': tf.FixedLenSequenceFeature([], tf.int64, allow_missing=True),
'image_raw': tf.FixedLenFeature([], tf.string)}
features = tf.parse_single_example(record, feature)
image = tf.decode_raw(features['image_raw'], tf.uint8)
label = features['label']
return {"image": image}, label # This is what will be fed into your model
ds = PipeModeDataset("train", record_format='TFRecord')
ds = ds.apply(map_and_batch(parse, batch_size=32, num_parallel_batches=2))
return ds
38. Pipe Mode challenges
• Converting data to supported format
• Sequential nature of pipe mode
• Pipe number limitation
• Currently stands at 20
39. Pipe Mode challenges
• Converting data to supported format
• Amazon SageMaker’s PipeModeDataset supports a limited number of formats
• Format conversion of “mountains” of data to TFRecord format seemed daunting
40. Generating data in TFRecord format
• Turned out to be a blessing in disguise
• Task was performed using AWS Batch
• Used up to hundreds of thousands of vCPUs in parallel. (Each created a100MB TFRecord file.)
• Tip: split generated TFRecord files into 100MB chunks
• Accelerated data preparation time significantly (from days to hours)
41. Generating data in TFRecord format
• One more step to moving end-to-end development to cloud:
Training and
evaluation
performed
using Amazon
SageMaker
Deployment
script run on
Amazon
SageMaker
42. Generating data in TFRecord format
• One more step to moving end-to-end development to cloud:
Training and
evaluation
performed
using Amazon
SageMaker
Deployment
script run on
Amazon
SageMaker
43. Generating data in TFRecord format
• One more step to moving end-to-end development to cloud:
Training and
evaluation
performed
using Amazon
SageMaker
Deployment
script run on
Amazon
SageMaker
44. Generating data in TFRecord format
• One more step to moving end-to-end development to cloud:
Training and
evaluation
performed
using Amazon
SageMaker
Deployment
script run on
Amazon
SageMaker
45. Generating data in TFRecord format
• One more step to moving end-to-end development to cloud:
Training and
evaluation
performed
using Amazon
SageMaker
Deployment
script run on
Amazon
SageMaker
46. Pipe mode challenges
• Sequential nature of pipe mode
• Overcoming the lack of random access to full
dataset
• Relied on for shuffling and boosting
47. Overcoming the lack of random access to data
• Challenge: shuffling
• In the pre-Amazon SageMaker era, we
relied on random access to the data to
ensure shuffling
• Solution: introduce shuffling on a number of levels
• Using Amazon Sagemaker ShuffleConfig class to shuffle TFRecord files for each epoch
train_data = s3_input('s3://sagemaker-path-to-train-data‘,
shuffle_config=ShuffleConfig(seed))
pipes = {'train':train_data}
• Using TF dataset shuffle
48. Overcoming the lack of random access to data
• Challenge: boosting
• Solution for increasing representation of
underrepresented data
• E.g., pink cars in a vehicle detection DNN
• In the pre-Amazon SageMaker era, we
relied on random access to the data to
perform boosting
• Solution:
• Underrepresented data can be separated and fed using a dedicated pipe
• Dataset APIs can be used to associate a weight with each pipe and interleave the pipes:
ds = tf.data.sample_from_datasets(datasets, weights)
pipe 1
pipe 2
49. Pipe Mode challenges
• Overcoming the limitation on number of pipes
• Why might I need more than 20 pipes?
• Boost underrepresented data
50. Pipe Mode challenges
• Overcoming the limitation on number of pipes
• Why might I need more than 20 pipes?
• Boost underrepresented data
• Distributed training with Horovod, multiply by number of GPUs
51. Overcoming the limitation on number of pipes
• Solution: use Pipe Mode manifest files
• An alternative way to configure an Amazon SageMaker pipe
• Replace path prefix with a list of files
• Include the same file multiple times to increase its weight
• Have finer control over the data used for training
data = s3_input('s3://path-to-manifest-file',
s3_data_type='ManifestFile',
shuffle_config=ShuffleConfig(seed))
53. Other considerations when moving to Amazon SageMaker
• Blog post: https://bit.ly/2sLRJb5
• Distributed training
• How to choose the instance type with the optimal number of GPUs
• Should one use TensorFlow distributed training or Horovod?
• How does one maximize resource utilization?
• How to take advantage of Spot Instances to reduce cost
• How to ensure that mid-train termination won’t have negative impact
• Using TensorBoard and other benchmarking tools
• Debugging on a remote environment
• Tip: make sure your model compiles and runs locally before running on cloud
54. Summary
• Adopting Amazon SageMaker has boosted our DNN development
capabilities
• Pipe Mode offers a compelling solution for training with large datasets
• Reduce development time significantly (~10X)
• Scalability enables you to test multiple models in parallel
• Integrating with other AWS services can provide further acceleration
• Challenges were overcome using advanced Amazon SageMaker
features and TF dataset APIs
• You can make Amazon SageMaker work for you too