SlideShare a Scribd company logo
1 of 25
AUTOMATED DEEP LEARNING TRAINING
WITH
AWS STEP FUNCTIONS / AWS LAMBDA
@mizti
PROBLEM WITH
DEEP LEARNING TRAING
1.
SERVER WITH GPU
REQUIRED
SERVERS WITH GNU ARE
EXPENSIVE
• Some thousands dollars / month with on demand instance
• Spot instance with bidding system: much low priced, but not
ignorable price for me
NOT IGNORABLE PRICE ?
• It costs equal to 1 or 2 “Tirol choco” for each server /
hour
• Not much, but I worry about…
* WELL-KNOWN IN
JAPAN, THE PRONOUN
OF CHEAP CONFECTION
AND
IT TAKES VERY LONG TIME
Half day, One day,
Occasionally some days
I WANT TO TERMINATE
SERVERS
ONCE TRAINING
COMPLETED
SO
PROBLEM WITH
DEEP LEARNING TRAINING
2.
ANNOYING COLLECTION OF
TRAINED DATA
WITH ONE SERVER,
IT TAKES ONLY FEW
MINUTES
WITH SCP
WITH MANY SERVERS,
IT TAKES LONG TIME
WHAT IS WORSE, WE DON’T KNOW WHEN
EACH TASK COMPETE
IN EACH SERVER
AND I GET CONFUSED
“WHAT WAS THE SETTING FOR THIS
SERVER?”
AT LAST, I TERMINATE
SERVER
WITHOUT EXTRACTING
DATA
I WANT TO GATHER DATA INTO
ONE PLACE AUTOMATICALLY
SO
AND WANT TO LABEL TRAINING
CONDITIONS…
SERVER-LESS
ARCHITECTURE
• Serverless computing (with my understanding) is
• Generate servers when I need, Terminate servers once task
completed
• Does not use any server to control above.
• Thus, I don’t need have any server usually,
and can generate any numbers of server when / as many as I
need.
• (becoming buzz-word these days ?)
SERVER-LESS SERVICES IN AWS
• AWS Lambda
• Users can register code with Node.js / Python /
Java / C#
• Registered codes can be hooked with events
from inside of AWS (and can be kicked by hand,
of cause)
• Users can automate AWS control with AWS SDK
for each languages ( like boto3 for Python )
• No special libraries for AWS Lambda,
IOW: AWS Lambda is just a register / starting
mechanism of codes
• One Lambda function can be alive only 60
seconds at most, so AWS Lambda is not suitable
for
long-time / many-state jobs.
SERVER-LESS SERVICES IN AWS
• AWS Step Functions
• Users can define multi-state machine like
“cell automaton”
• Fork / Parallel processes are also can be
defined
• Each state inputs / receives data into /
from AWS Lambda functions.
• You can check status of states (process)
with Web UI visually.
• Users can control long-time / multi-path
process
WHAT I WANTED TO MAKE:
1. Create S3 bucket for each execution
2. Bid a spot instance
3. If the bidding suceeds, and a spot instance is generated,
• Notify with AWS SNS (Email or SMS)
• Prepare to training ( Downloading training etc.)
• Start training
• Periodically upload model dump / output data / logs into S3 bucket
4. Once training completed
• Notify with AWS SNS (Email or SMS)
• Terminate instance after a certain period of times
I MADE:
Create S3 bucket
Request Spot Instance
Check if the bidding succeeded
Notify bidding success
Check if the task completed
Wait for the task completed
Notify task completed
Terminate Spot instance
USAGE
• Input a set of json like below to start Step Function
• exec_name: name of this execution (also become a name of S3 bucket)
• repository url: git repository of code to exec ( used like git clone {repository url} )
• data_dir / output_dir: directory of training data and output data
• data_get_command: command executed before training. (typically, getting training data for
machine learning)
• exec_command: executed command for training.
USAGE
Input a json, and ..
USAGE
Just push "Start Execution"
USAGE
・Progress can be checked on Web UI
・Output result is automatically carried into
S3 bucket.
BENEFIT
• Start and Forget. Sleep peacefully.
• Make it easy to parallel execution with many
patterns of hyper-params
• No need of modifying training / model codes
• Maybe used also for many kinds of
batch-like process
MISC
• Author: @mizti
any comments / questions welcomed
• Details: wrote in my blog (but in Japanese lang ; )
http://mizti.hatenablog.com/entry/deeplearningwithawsstepf
unction
• Code repository:
https://github.com/mizti/aws_stepfunc_chainer
• Illustration in this slides:
http://www.irasutoya.com/

More Related Content

What's hot

Ruby On Google App Engine 2nd Athens Ruby Me
Ruby On Google App Engine 2nd Athens Ruby MeRuby On Google App Engine 2nd Athens Ruby Me
Ruby On Google App Engine 2nd Athens Ruby MePanagiotis Papadopoulos
 
High performance in react native
High performance in react nativeHigh performance in react native
High performance in react nativeViet Tran
 
Lotuscript for large systems
Lotuscript for large systemsLotuscript for large systems
Lotuscript for large systemsBill Buchan
 
Олександр Хотемський:”Serverless архітектура та її застосування в автоматизац...
Олександр Хотемський:”Serverless архітектура та її застосування в автоматизац...Олександр Хотемський:”Serverless архітектура та її застосування в автоматизац...
Олександр Хотемський:”Serverless архітектура та її застосування в автоматизац...Dakiry
 
Problems you’ll face in the Microservices World: Configuration, Authenticatio...
Problems you’ll face in the Microservices World: Configuration, Authenticatio...Problems you’ll face in the Microservices World: Configuration, Authenticatio...
Problems you’ll face in the Microservices World: Configuration, Authenticatio...Quentin Adam
 
Introduction to scaling your WordPress site past a single node using AWS
Introduction to scaling your WordPress site past a single node using AWSIntroduction to scaling your WordPress site past a single node using AWS
Introduction to scaling your WordPress site past a single node using AWSWP Engine
 
nfusr: a new userspace NFS client based on libnfs - Shreyas Siravara
nfusr: a new userspace NFS client based on libnfs - Shreyas Siravaranfusr: a new userspace NFS client based on libnfs - Shreyas Siravara
nfusr: a new userspace NFS client based on libnfs - Shreyas SiravaraGluster.org
 
Ember Overview in 5 Minutes
Ember Overview in 5 MinutesEmber Overview in 5 Minutes
Ember Overview in 5 MinutesJay Phelps
 
Introduce cucumber
Introduce cucumberIntroduce cucumber
Introduce cucumberBachue Zhou
 
Makes JavaScript Fun Again
Makes JavaScript Fun AgainMakes JavaScript Fun Again
Makes JavaScript Fun AgainAsep Bagja
 
Keeping MongoDB Data Safe
Keeping MongoDB Data SafeKeeping MongoDB Data Safe
Keeping MongoDB Data SafeTony Tam
 
MONITORING THE UNKNOWN, 1000*100 SERIES A DAY - DEVOXX MOROCCO 2017
MONITORING THE UNKNOWN, 1000*100 SERIES A DAY - DEVOXX MOROCCO 2017MONITORING THE UNKNOWN, 1000*100 SERIES A DAY - DEVOXX MOROCCO 2017
MONITORING THE UNKNOWN, 1000*100 SERIES A DAY - DEVOXX MOROCCO 2017Quentin Adam
 
Coffee script final
Coffee script finalCoffee script final
Coffee script finalpriyankazope
 
Tech Talk #4 : Functional Reactive Programming - Đặng Thái Sơn
Tech Talk #4 : Functional Reactive Programming - Đặng Thái SơnTech Talk #4 : Functional Reactive Programming - Đặng Thái Sơn
Tech Talk #4 : Functional Reactive Programming - Đặng Thái SơnNexus FrontierTech
 
Introduction to Django-Celery and Supervisor
Introduction to Django-Celery and SupervisorIntroduction to Django-Celery and Supervisor
Introduction to Django-Celery and SupervisorSuresh Kumar
 
Ansible on aws - Pop-up Loft Tel Aviv
Ansible on aws - Pop-up Loft Tel AvivAnsible on aws - Pop-up Loft Tel Aviv
Ansible on aws - Pop-up Loft Tel AvivAmazon Web Services
 

What's hot (20)

Dev-Friendly Ops
Dev-Friendly OpsDev-Friendly Ops
Dev-Friendly Ops
 
Ruby On Google App Engine 2nd Athens Ruby Me
Ruby On Google App Engine 2nd Athens Ruby MeRuby On Google App Engine 2nd Athens Ruby Me
Ruby On Google App Engine 2nd Athens Ruby Me
 
High performance in react native
High performance in react nativeHigh performance in react native
High performance in react native
 
Lotuscript for large systems
Lotuscript for large systemsLotuscript for large systems
Lotuscript for large systems
 
Async js
Async jsAsync js
Async js
 
Apache pulsar
Apache pulsarApache pulsar
Apache pulsar
 
Олександр Хотемський:”Serverless архітектура та її застосування в автоматизац...
Олександр Хотемський:”Serverless архітектура та її застосування в автоматизац...Олександр Хотемський:”Serverless архітектура та її застосування в автоматизац...
Олександр Хотемський:”Serverless архітектура та її застосування в автоматизац...
 
Problems you’ll face in the Microservices World: Configuration, Authenticatio...
Problems you’ll face in the Microservices World: Configuration, Authenticatio...Problems you’ll face in the Microservices World: Configuration, Authenticatio...
Problems you’ll face in the Microservices World: Configuration, Authenticatio...
 
Introduction to scaling your WordPress site past a single node using AWS
Introduction to scaling your WordPress site past a single node using AWSIntroduction to scaling your WordPress site past a single node using AWS
Introduction to scaling your WordPress site past a single node using AWS
 
FireBug And FirePHP
FireBug And FirePHPFireBug And FirePHP
FireBug And FirePHP
 
nfusr: a new userspace NFS client based on libnfs - Shreyas Siravara
nfusr: a new userspace NFS client based on libnfs - Shreyas Siravaranfusr: a new userspace NFS client based on libnfs - Shreyas Siravara
nfusr: a new userspace NFS client based on libnfs - Shreyas Siravara
 
Ember Overview in 5 Minutes
Ember Overview in 5 MinutesEmber Overview in 5 Minutes
Ember Overview in 5 Minutes
 
Introduce cucumber
Introduce cucumberIntroduce cucumber
Introduce cucumber
 
Makes JavaScript Fun Again
Makes JavaScript Fun AgainMakes JavaScript Fun Again
Makes JavaScript Fun Again
 
Keeping MongoDB Data Safe
Keeping MongoDB Data SafeKeeping MongoDB Data Safe
Keeping MongoDB Data Safe
 
MONITORING THE UNKNOWN, 1000*100 SERIES A DAY - DEVOXX MOROCCO 2017
MONITORING THE UNKNOWN, 1000*100 SERIES A DAY - DEVOXX MOROCCO 2017MONITORING THE UNKNOWN, 1000*100 SERIES A DAY - DEVOXX MOROCCO 2017
MONITORING THE UNKNOWN, 1000*100 SERIES A DAY - DEVOXX MOROCCO 2017
 
Coffee script final
Coffee script finalCoffee script final
Coffee script final
 
Tech Talk #4 : Functional Reactive Programming - Đặng Thái Sơn
Tech Talk #4 : Functional Reactive Programming - Đặng Thái SơnTech Talk #4 : Functional Reactive Programming - Đặng Thái Sơn
Tech Talk #4 : Functional Reactive Programming - Đặng Thái Sơn
 
Introduction to Django-Celery and Supervisor
Introduction to Django-Celery and SupervisorIntroduction to Django-Celery and Supervisor
Introduction to Django-Celery and Supervisor
 
Ansible on aws - Pop-up Loft Tel Aviv
Ansible on aws - Pop-up Loft Tel AvivAnsible on aws - Pop-up Loft Tel Aviv
Ansible on aws - Pop-up Loft Tel Aviv
 

Viewers also liked

AWS Step FunctionとLambdaでディープラーニングの訓練を全自動化する
AWS Step FunctionとLambdaでディープラーニングの訓練を全自動化するAWS Step FunctionとLambdaでディープラーニングの訓練を全自動化する
AWS Step FunctionとLambdaでディープラーニングの訓練を全自動化するmizugokoro
 
Introduction to AWS Step Functions:
Introduction to AWS Step Functions: Introduction to AWS Step Functions:
Introduction to AWS Step Functions: Amazon Web Services
 
八子Openingプレゼン 130525
八子Openingプレゼン 130525八子Openingプレゼン 130525
八子Openingプレゼン 130525知礼 八子
 
Aws Lambda Cart Microservice Server Less
Aws Lambda Cart Microservice Server LessAws Lambda Cart Microservice Server Less
Aws Lambda Cart Microservice Server LessDhanu Gupta
 
スーパーコンピューターとクラウドでのOpenFOAM性能・費用ベンチマークテスト
スーパーコンピューターとクラウドでのOpenFOAM性能・費用ベンチマークテストスーパーコンピューターとクラウドでのOpenFOAM性能・費用ベンチマークテスト
スーパーコンピューターとクラウドでのOpenFOAM性能・費用ベンチマークテストMasanori Sumitomo
 
Micro services infrastructure with AWS and Ansible
Micro services infrastructure with AWS and AnsibleMicro services infrastructure with AWS and Ansible
Micro services infrastructure with AWS and AnsibleBamdad Dashtban
 
座談会資料 事前配布 20170225
座談会資料 事前配布 20170225座談会資料 事前配布 20170225
座談会資料 事前配布 20170225知礼 八子
 
(GAM301) Real-Time Game Analytics with Amazon Kinesis, Amazon Redshift, and A...
(GAM301) Real-Time Game Analytics with Amazon Kinesis, Amazon Redshift, and A...(GAM301) Real-Time Game Analytics with Amazon Kinesis, Amazon Redshift, and A...
(GAM301) Real-Time Game Analytics with Amazon Kinesis, Amazon Redshift, and A...Amazon Web Services
 
Deliver Docker Containers Continuously on AWS - QCon 2017
Deliver Docker Containers Continuously on AWS - QCon 2017Deliver Docker Containers Continuously on AWS - QCon 2017
Deliver Docker Containers Continuously on AWS - QCon 2017Philipp Garbe
 
HPC で使えそうな FPGA 搭載 AWS F1 インスタンス 20170127
HPC で使えそうな FPGA 搭載 AWS F1 インスタンス 20170127HPC で使えそうな FPGA 搭載 AWS F1 インスタンス 20170127
HPC で使えそうな FPGA 搭載 AWS F1 インスタンス 20170127HPCシステムズ株式会社
 
AWS Step Functions 実践
AWS Step Functions 実践AWS Step Functions 実践
AWS Step Functions 実践Shuji Kikuchi
 
ウフル・Enebular紹介 170225
ウフル・Enebular紹介 170225ウフル・Enebular紹介 170225
ウフル・Enebular紹介 170225知礼 八子
 
Poster-An Expert System for Car Failure Diagnosis
Poster-An Expert System for Car Failure DiagnosisPoster-An Expert System for Car Failure Diagnosis
Poster-An Expert System for Car Failure DiagnosisViralkumar Jayswal
 
Auto scaling using Amazon Web Services ( AWS )
Auto scaling using Amazon Web Services ( AWS )Auto scaling using Amazon Web Services ( AWS )
Auto scaling using Amazon Web Services ( AWS )Harish Ganesan
 
Migrate your Data Warehouse to Amazon Redshift - September Webinar Series
Migrate your Data Warehouse to Amazon Redshift - September Webinar SeriesMigrate your Data Warehouse to Amazon Redshift - September Webinar Series
Migrate your Data Warehouse to Amazon Redshift - September Webinar SeriesAmazon Web Services
 
Auto scaling with Ruby, AWS, Jenkins and Redis
Auto scaling with Ruby, AWS, Jenkins and RedisAuto scaling with Ruby, AWS, Jenkins and Redis
Auto scaling with Ruby, AWS, Jenkins and RedisYi Hsuan (Jeddie) Chuang
 
Building a Machine Learning App with AWS Lambda
Building a Machine Learning App with AWS LambdaBuilding a Machine Learning App with AWS Lambda
Building a Machine Learning App with AWS LambdaSri Ambati
 
NEW LAUNCH! Building Distributed Applications with AWS Step Functions
NEW LAUNCH! Building Distributed Applications with AWS Step FunctionsNEW LAUNCH! Building Distributed Applications with AWS Step Functions
NEW LAUNCH! Building Distributed Applications with AWS Step FunctionsAmazon Web Services
 

Viewers also liked (20)

AWS Step FunctionとLambdaでディープラーニングの訓練を全自動化する
AWS Step FunctionとLambdaでディープラーニングの訓練を全自動化するAWS Step FunctionとLambdaでディープラーニングの訓練を全自動化する
AWS Step FunctionとLambdaでディープラーニングの訓練を全自動化する
 
Introduction to AWS Step Functions:
Introduction to AWS Step Functions: Introduction to AWS Step Functions:
Introduction to AWS Step Functions:
 
八子Openingプレゼン 130525
八子Openingプレゼン 130525八子Openingプレゼン 130525
八子Openingプレゼン 130525
 
Aws Lambda Cart Microservice Server Less
Aws Lambda Cart Microservice Server LessAws Lambda Cart Microservice Server Less
Aws Lambda Cart Microservice Server Less
 
スーパーコンピューターとクラウドでのOpenFOAM性能・費用ベンチマークテスト
スーパーコンピューターとクラウドでのOpenFOAM性能・費用ベンチマークテストスーパーコンピューターとクラウドでのOpenFOAM性能・費用ベンチマークテスト
スーパーコンピューターとクラウドでのOpenFOAM性能・費用ベンチマークテスト
 
Micro services infrastructure with AWS and Ansible
Micro services infrastructure with AWS and AnsibleMicro services infrastructure with AWS and Ansible
Micro services infrastructure with AWS and Ansible
 
座談会資料 事前配布 20170225
座談会資料 事前配布 20170225座談会資料 事前配布 20170225
座談会資料 事前配布 20170225
 
(GAM301) Real-Time Game Analytics with Amazon Kinesis, Amazon Redshift, and A...
(GAM301) Real-Time Game Analytics with Amazon Kinesis, Amazon Redshift, and A...(GAM301) Real-Time Game Analytics with Amazon Kinesis, Amazon Redshift, and A...
(GAM301) Real-Time Game Analytics with Amazon Kinesis, Amazon Redshift, and A...
 
Deliver Docker Containers Continuously on AWS - QCon 2017
Deliver Docker Containers Continuously on AWS - QCon 2017Deliver Docker Containers Continuously on AWS - QCon 2017
Deliver Docker Containers Continuously on AWS - QCon 2017
 
HPC で使えそうな FPGA 搭載 AWS F1 インスタンス 20170127
HPC で使えそうな FPGA 搭載 AWS F1 インスタンス 20170127HPC で使えそうな FPGA 搭載 AWS F1 インスタンス 20170127
HPC で使えそうな FPGA 搭載 AWS F1 インスタンス 20170127
 
AWS Step Functions 実践
AWS Step Functions 実践AWS Step Functions 実践
AWS Step Functions 実践
 
ウフル・Enebular紹介 170225
ウフル・Enebular紹介 170225ウフル・Enebular紹介 170225
ウフル・Enebular紹介 170225
 
Poster-An Expert System for Car Failure Diagnosis
Poster-An Expert System for Car Failure DiagnosisPoster-An Expert System for Car Failure Diagnosis
Poster-An Expert System for Car Failure Diagnosis
 
Auto scaling using Amazon Web Services ( AWS )
Auto scaling using Amazon Web Services ( AWS )Auto scaling using Amazon Web Services ( AWS )
Auto scaling using Amazon Web Services ( AWS )
 
Migrate your Data Warehouse to Amazon Redshift - September Webinar Series
Migrate your Data Warehouse to Amazon Redshift - September Webinar SeriesMigrate your Data Warehouse to Amazon Redshift - September Webinar Series
Migrate your Data Warehouse to Amazon Redshift - September Webinar Series
 
Auto scaling with Ruby, AWS, Jenkins and Redis
Auto scaling with Ruby, AWS, Jenkins and RedisAuto scaling with Ruby, AWS, Jenkins and Redis
Auto scaling with Ruby, AWS, Jenkins and Redis
 
Building a Machine Learning App with AWS Lambda
Building a Machine Learning App with AWS LambdaBuilding a Machine Learning App with AWS Lambda
Building a Machine Learning App with AWS Lambda
 
Incident Coordination Workshop
Incident Coordination WorkshopIncident Coordination Workshop
Incident Coordination Workshop
 
NEW LAUNCH! Building Distributed Applications with AWS Step Functions
NEW LAUNCH! Building Distributed Applications with AWS Step FunctionsNEW LAUNCH! Building Distributed Applications with AWS Step Functions
NEW LAUNCH! Building Distributed Applications with AWS Step Functions
 
Technical Track
Technical TrackTechnical Track
Technical Track
 

Similar to Automation of Deep learning training with AWS Step Functions

Serverless in java Lessons learnt
Serverless in java Lessons learntServerless in java Lessons learnt
Serverless in java Lessons learntKrzysztof Pawlowski
 
Serverless in Java Lessons learnt
Serverless in Java Lessons learntServerless in Java Lessons learnt
Serverless in Java Lessons learntKrzysztof Pawlowski
 
AWS DevOps - Terraform, Docker, HashiCorp Vault
AWS DevOps - Terraform, Docker, HashiCorp VaultAWS DevOps - Terraform, Docker, HashiCorp Vault
AWS DevOps - Terraform, Docker, HashiCorp VaultGrzegorz Adamowicz
 
Getting Started with Serverless Architectures
Getting Started with Serverless ArchitecturesGetting Started with Serverless Architectures
Getting Started with Serverless ArchitecturesAmazon Web Services
 
Scaling Django Apps using AWS Elastic Beanstalk
Scaling Django Apps using AWS Elastic BeanstalkScaling Django Apps using AWS Elastic Beanstalk
Scaling Django Apps using AWS Elastic BeanstalkLushen Wu
 
Distributed Tensorflow with Kubernetes - data2day - Jakob Karalus
Distributed Tensorflow with Kubernetes - data2day - Jakob KaralusDistributed Tensorflow with Kubernetes - data2day - Jakob Karalus
Distributed Tensorflow with Kubernetes - data2day - Jakob KaralusJakob Karalus
 
GOTO Stockholm - AWS Lambda - Logic in the cloud without a back-end
GOTO Stockholm - AWS Lambda - Logic in the cloud without a back-endGOTO Stockholm - AWS Lambda - Logic in the cloud without a back-end
GOTO Stockholm - AWS Lambda - Logic in the cloud without a back-endIan Massingham
 
AWS Lambda Presentation (Tech Talk DC)
AWS Lambda Presentation (Tech Talk DC)AWS Lambda Presentation (Tech Talk DC)
AWS Lambda Presentation (Tech Talk DC)Doguhan Uluca
 
Building Serverless Web Applications - DevDay Austin 2017
Building Serverless Web Applications - DevDay Austin 2017Building Serverless Web Applications - DevDay Austin 2017
Building Serverless Web Applications - DevDay Austin 2017Amazon Web Services
 
Serverless design considerations for Cloud Native workloads
Serverless design considerations for Cloud Native workloadsServerless design considerations for Cloud Native workloads
Serverless design considerations for Cloud Native workloadsTensult
 
Journey towards serverless infrastructure
Journey towards serverless infrastructureJourney towards serverless infrastructure
Journey towards serverless infrastructureVille Seppänen
 
Get the EDGE to scale: Using Cloudfront along with edge compute to scale your...
Get the EDGE to scale: Using Cloudfront along with edge compute to scale your...Get the EDGE to scale: Using Cloudfront along with edge compute to scale your...
Get the EDGE to scale: Using Cloudfront along with edge compute to scale your...Amazon Web Services
 
OpenNTF Webinar - October 2021: Return of the DOTS
OpenNTF Webinar - October 2021: Return of the DOTSOpenNTF Webinar - October 2021: Return of the DOTS
OpenNTF Webinar - October 2021: Return of the DOTSSerdar Basegmez
 
Cloudy in Indonesia: Java and Cloud
Cloudy in Indonesia: Java and CloudCloudy in Indonesia: Java and Cloud
Cloudy in Indonesia: Java and CloudEberhard Wolff
 
Serverless at Lifestage
Serverless at LifestageServerless at Lifestage
Serverless at LifestageBATbern
 
Getting Started with AWS Lambda & Serverless Cloud
Getting Started with AWS Lambda & Serverless CloudGetting Started with AWS Lambda & Serverless Cloud
Getting Started with AWS Lambda & Serverless CloudIan Massingham
 
Domain's Robot Army
Domain's Robot ArmyDomain's Robot Army
Domain's Robot Armydomaingroup
 
Ansible benelux meetup - Amsterdam 27-5-2015
Ansible benelux meetup - Amsterdam 27-5-2015Ansible benelux meetup - Amsterdam 27-5-2015
Ansible benelux meetup - Amsterdam 27-5-2015Pavel Chunyayev
 
Real time system_performance_mon
Real time system_performance_monReal time system_performance_mon
Real time system_performance_monTomas Doran
 

Similar to Automation of Deep learning training with AWS Step Functions (20)

Serverless in java Lessons learnt
Serverless in java Lessons learntServerless in java Lessons learnt
Serverless in java Lessons learnt
 
Serverless in Java Lessons learnt
Serverless in Java Lessons learntServerless in Java Lessons learnt
Serverless in Java Lessons learnt
 
AWS DevOps - Terraform, Docker, HashiCorp Vault
AWS DevOps - Terraform, Docker, HashiCorp VaultAWS DevOps - Terraform, Docker, HashiCorp Vault
AWS DevOps - Terraform, Docker, HashiCorp Vault
 
Getting Started with Serverless Architectures
Getting Started with Serverless ArchitecturesGetting Started with Serverless Architectures
Getting Started with Serverless Architectures
 
Scaling Django Apps using AWS Elastic Beanstalk
Scaling Django Apps using AWS Elastic BeanstalkScaling Django Apps using AWS Elastic Beanstalk
Scaling Django Apps using AWS Elastic Beanstalk
 
Distributed Tensorflow with Kubernetes - data2day - Jakob Karalus
Distributed Tensorflow with Kubernetes - data2day - Jakob KaralusDistributed Tensorflow with Kubernetes - data2day - Jakob Karalus
Distributed Tensorflow with Kubernetes - data2day - Jakob Karalus
 
GOTO Stockholm - AWS Lambda - Logic in the cloud without a back-end
GOTO Stockholm - AWS Lambda - Logic in the cloud without a back-endGOTO Stockholm - AWS Lambda - Logic in the cloud without a back-end
GOTO Stockholm - AWS Lambda - Logic in the cloud without a back-end
 
AWS Lambda Presentation (Tech Talk DC)
AWS Lambda Presentation (Tech Talk DC)AWS Lambda Presentation (Tech Talk DC)
AWS Lambda Presentation (Tech Talk DC)
 
Building Serverless Web Applications - DevDay Austin 2017
Building Serverless Web Applications - DevDay Austin 2017Building Serverless Web Applications - DevDay Austin 2017
Building Serverless Web Applications - DevDay Austin 2017
 
Serverless design considerations for Cloud Native workloads
Serverless design considerations for Cloud Native workloadsServerless design considerations for Cloud Native workloads
Serverless design considerations for Cloud Native workloads
 
Journey towards serverless infrastructure
Journey towards serverless infrastructureJourney towards serverless infrastructure
Journey towards serverless infrastructure
 
AWS Lambda and Serverless Cloud
AWS Lambda and Serverless CloudAWS Lambda and Serverless Cloud
AWS Lambda and Serverless Cloud
 
Get the EDGE to scale: Using Cloudfront along with edge compute to scale your...
Get the EDGE to scale: Using Cloudfront along with edge compute to scale your...Get the EDGE to scale: Using Cloudfront along with edge compute to scale your...
Get the EDGE to scale: Using Cloudfront along with edge compute to scale your...
 
OpenNTF Webinar - October 2021: Return of the DOTS
OpenNTF Webinar - October 2021: Return of the DOTSOpenNTF Webinar - October 2021: Return of the DOTS
OpenNTF Webinar - October 2021: Return of the DOTS
 
Cloudy in Indonesia: Java and Cloud
Cloudy in Indonesia: Java and CloudCloudy in Indonesia: Java and Cloud
Cloudy in Indonesia: Java and Cloud
 
Serverless at Lifestage
Serverless at LifestageServerless at Lifestage
Serverless at Lifestage
 
Getting Started with AWS Lambda & Serverless Cloud
Getting Started with AWS Lambda & Serverless CloudGetting Started with AWS Lambda & Serverless Cloud
Getting Started with AWS Lambda & Serverless Cloud
 
Domain's Robot Army
Domain's Robot ArmyDomain's Robot Army
Domain's Robot Army
 
Ansible benelux meetup - Amsterdam 27-5-2015
Ansible benelux meetup - Amsterdam 27-5-2015Ansible benelux meetup - Amsterdam 27-5-2015
Ansible benelux meetup - Amsterdam 27-5-2015
 
Real time system_performance_mon
Real time system_performance_monReal time system_performance_mon
Real time system_performance_mon
 

Recently uploaded

Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfngoud9212
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 

Recently uploaded (20)

Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Bluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdfBluetooth Controlled Car with Arduino.pdf
Bluetooth Controlled Car with Arduino.pdf
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 

Automation of Deep learning training with AWS Step Functions

  • 1. AUTOMATED DEEP LEARNING TRAINING WITH AWS STEP FUNCTIONS / AWS LAMBDA @mizti
  • 4. SERVERS WITH GNU ARE EXPENSIVE • Some thousands dollars / month with on demand instance • Spot instance with bidding system: much low priced, but not ignorable price for me
  • 5. NOT IGNORABLE PRICE ? • It costs equal to 1 or 2 “Tirol choco” for each server / hour • Not much, but I worry about… * WELL-KNOWN IN JAPAN, THE PRONOUN OF CHEAP CONFECTION
  • 6. AND IT TAKES VERY LONG TIME Half day, One day, Occasionally some days
  • 7. I WANT TO TERMINATE SERVERS ONCE TRAINING COMPLETED SO
  • 10. WITH ONE SERVER, IT TAKES ONLY FEW MINUTES WITH SCP
  • 11. WITH MANY SERVERS, IT TAKES LONG TIME WHAT IS WORSE, WE DON’T KNOW WHEN EACH TASK COMPETE IN EACH SERVER
  • 12. AND I GET CONFUSED “WHAT WAS THE SETTING FOR THIS SERVER?”
  • 13. AT LAST, I TERMINATE SERVER WITHOUT EXTRACTING DATA
  • 14. I WANT TO GATHER DATA INTO ONE PLACE AUTOMATICALLY SO AND WANT TO LABEL TRAINING CONDITIONS…
  • 15. SERVER-LESS ARCHITECTURE • Serverless computing (with my understanding) is • Generate servers when I need, Terminate servers once task completed • Does not use any server to control above. • Thus, I don’t need have any server usually, and can generate any numbers of server when / as many as I need. • (becoming buzz-word these days ?)
  • 16. SERVER-LESS SERVICES IN AWS • AWS Lambda • Users can register code with Node.js / Python / Java / C# • Registered codes can be hooked with events from inside of AWS (and can be kicked by hand, of cause) • Users can automate AWS control with AWS SDK for each languages ( like boto3 for Python ) • No special libraries for AWS Lambda, IOW: AWS Lambda is just a register / starting mechanism of codes • One Lambda function can be alive only 60 seconds at most, so AWS Lambda is not suitable for long-time / many-state jobs.
  • 17. SERVER-LESS SERVICES IN AWS • AWS Step Functions • Users can define multi-state machine like “cell automaton” • Fork / Parallel processes are also can be defined • Each state inputs / receives data into / from AWS Lambda functions. • You can check status of states (process) with Web UI visually. • Users can control long-time / multi-path process
  • 18. WHAT I WANTED TO MAKE: 1. Create S3 bucket for each execution 2. Bid a spot instance 3. If the bidding suceeds, and a spot instance is generated, • Notify with AWS SNS (Email or SMS) • Prepare to training ( Downloading training etc.) • Start training • Periodically upload model dump / output data / logs into S3 bucket 4. Once training completed • Notify with AWS SNS (Email or SMS) • Terminate instance after a certain period of times
  • 19. I MADE: Create S3 bucket Request Spot Instance Check if the bidding succeeded Notify bidding success Check if the task completed Wait for the task completed Notify task completed Terminate Spot instance
  • 20. USAGE • Input a set of json like below to start Step Function • exec_name: name of this execution (also become a name of S3 bucket) • repository url: git repository of code to exec ( used like git clone {repository url} ) • data_dir / output_dir: directory of training data and output data • data_get_command: command executed before training. (typically, getting training data for machine learning) • exec_command: executed command for training.
  • 23. USAGE ・Progress can be checked on Web UI ・Output result is automatically carried into S3 bucket.
  • 24. BENEFIT • Start and Forget. Sleep peacefully. • Make it easy to parallel execution with many patterns of hyper-params • No need of modifying training / model codes • Maybe used also for many kinds of batch-like process
  • 25. MISC • Author: @mizti any comments / questions welcomed • Details: wrote in my blog (but in Japanese lang ; ) http://mizti.hatenablog.com/entry/deeplearningwithawsstepf unction • Code repository: https://github.com/mizti/aws_stepfunc_chainer • Illustration in this slides: http://www.irasutoya.com/