4. SERVERS WITH GNU ARE
EXPENSIVE
• Some thousands dollars / month with on demand instance
• Spot instance with bidding system: much low priced, but not
ignorable price for me
5. NOT IGNORABLE PRICE ?
• It costs equal to 1 or 2 “Tirol choco” for each server /
hour
• Not much, but I worry about…
* WELL-KNOWN IN
JAPAN, THE PRONOUN
OF CHEAP CONFECTION
6. AND
IT TAKES VERY LONG TIME
Half day, One day,
Occasionally some days
7. I WANT TO TERMINATE
SERVERS
ONCE TRAINING
COMPLETED
SO
11. WITH MANY SERVERS,
IT TAKES LONG TIME
WHAT IS WORSE, WE DON’T KNOW WHEN
EACH TASK COMPETE
IN EACH SERVER
12. AND I GET CONFUSED
“WHAT WAS THE SETTING FOR THIS
SERVER?”
13. AT LAST, I TERMINATE
SERVER
WITHOUT EXTRACTING
DATA
14. I WANT TO GATHER DATA INTO
ONE PLACE AUTOMATICALLY
SO
AND WANT TO LABEL TRAINING
CONDITIONS…
15. SERVER-LESS
ARCHITECTURE
• Serverless computing (with my understanding) is
• Generate servers when I need, Terminate servers once task
completed
• Does not use any server to control above.
• Thus, I don’t need have any server usually,
and can generate any numbers of server when / as many as I
need.
• (becoming buzz-word these days ?)
16. SERVER-LESS SERVICES IN AWS
• AWS Lambda
• Users can register code with Node.js / Python /
Java / C#
• Registered codes can be hooked with events
from inside of AWS (and can be kicked by hand,
of cause)
• Users can automate AWS control with AWS SDK
for each languages ( like boto3 for Python )
• No special libraries for AWS Lambda,
IOW: AWS Lambda is just a register / starting
mechanism of codes
• One Lambda function can be alive only 60
seconds at most, so AWS Lambda is not suitable
for
long-time / many-state jobs.
17. SERVER-LESS SERVICES IN AWS
• AWS Step Functions
• Users can define multi-state machine like
“cell automaton”
• Fork / Parallel processes are also can be
defined
• Each state inputs / receives data into /
from AWS Lambda functions.
• You can check status of states (process)
with Web UI visually.
• Users can control long-time / multi-path
process
18. WHAT I WANTED TO MAKE:
1. Create S3 bucket for each execution
2. Bid a spot instance
3. If the bidding suceeds, and a spot instance is generated,
• Notify with AWS SNS (Email or SMS)
• Prepare to training ( Downloading training etc.)
• Start training
• Periodically upload model dump / output data / logs into S3 bucket
4. Once training completed
• Notify with AWS SNS (Email or SMS)
• Terminate instance after a certain period of times
19. I MADE:
Create S3 bucket
Request Spot Instance
Check if the bidding succeeded
Notify bidding success
Check if the task completed
Wait for the task completed
Notify task completed
Terminate Spot instance
20. USAGE
• Input a set of json like below to start Step Function
• exec_name: name of this execution (also become a name of S3 bucket)
• repository url: git repository of code to exec ( used like git clone {repository url} )
• data_dir / output_dir: directory of training data and output data
• data_get_command: command executed before training. (typically, getting training data for
machine learning)
• exec_command: executed command for training.
23. USAGE
・Progress can be checked on Web UI
・Output result is automatically carried into
S3 bucket.
24. BENEFIT
• Start and Forget. Sleep peacefully.
• Make it easy to parallel execution with many
patterns of hyper-params
• No need of modifying training / model codes
• Maybe used also for many kinds of
batch-like process
25. MISC
• Author: @mizti
any comments / questions welcomed
• Details: wrote in my blog (but in Japanese lang ; )
http://mizti.hatenablog.com/entry/deeplearningwithawsstepf
unction
• Code repository:
https://github.com/mizti/aws_stepfunc_chainer
• Illustration in this slides:
http://www.irasutoya.com/