2015年12月7日に開催されたIVS CTO Night & Day 2015 WinterのSession B-2 : EC2 Container Service Deep diveの資料です。イベントの様子や他の資料は以下ブログをご覧ください。
http://aws.typepad.com/sajp/2015/12/ivs-cto-night-day-2015-winter-powered-by-aws.html
51. Remind
• A messaging platform for teachers.
• Chat/announcements/files
• Over 30 million users
• Used actively in ~50% of U.S. public schools
• Over 2 billion messages delivered
• ~50 employees. ~30 engineers.
52. Heroku was great, but…
• Every app on Heroku is publicly accessible
• Databases need to be exposed to Internet traffic
• Limited visibility and control
53. What we want from a PaaS
• AWS
• Flexibility
• Shared patterns for deployment
• Easy service operation
• Containers/Docker
55. Design Goals
• Easy to operate
• Open source
• Support 12-factor stateless apps (12factor.net)
• Swappable scheduling back-ends
• Stability!
• Docker images as a unit of deployment
57. Empire :: V2
An open-source, self-hosted PaaS for running
twelve-factor Docker apps backed by AWS
services
58. Twelve-Factor Tenants
I. Codebase
II. Dependencies
III. Config
IV. Backing Services
V. Build, release, run
VI. Processes
VII. Port binding
VIII.Concurrency
IX. Disposability
X. Dev/prod parity
XI. Logs
XII. Admin processes
73. Bad Old Days of Batch Processing @ Coursera
Cascade
• PHP-based job runner
• Originally ran in screen sessions
• Polled APIs for new jobs
• Forced restarts on regular basis
due to unidentified memory leaks
• Fragile and unreliable
The early
days…
74. Bad Old Days of Batch Processing @ Coursera
Saturn
• Scala scheduled batch job runner
• Powered by Quartz Scheduler library
• Better than Cascade, but…
• All jobs ran on same JVM, causing
interference
The not-
so early
days?
75. What Else Did We Look At?
Home-grown Tech
• Tried, but proved
to be unreliable
• Difficult to
handle
coordination and
synchronization
• Powerful, but
hard to
productionize
• Needs
developers with
experience
• Designed for
GCE first
• Not a managed
service, higher
Ops load
76. Amazon ECS to the Rescue
Little
maintenance
Integrated with
rest of AWS
Easy to
develop for
77. However…
Amazon ECS is a great building block,
but we still need to build tools around it
for our purposes.
78. What We Built: Iguazú
Marissa Strniste (https://www.flickr.com/photos/mstrniste/5999464924) CC-BY-2.0
• Batch Job Scheduler for Amazon ECS
• Immediately
• Deferred (run once at X time)
• Scheduled recurring (cron-like)
• Programmatically accessible internally via
our standard APIs and clients
• Named for Iguazú falls
• World’s largest waterfall by volume
• We hope Iguazú handles a similar volume of jobs
81. Deploying Jobs
Easy Deployment
1. Developers Merge into master. Done!
Jenkins Build Steps:
1. Builds zip package from master
2. Prepares Docker image with zip file
3. Pushes image into Docker registry
4. Registers updated jobs with
Amazon ECS API
82. Since April 2015…
65 jobs in
production
>1000 runs
per day
44 different
scheduled jobs
84. The Security Challenge
Compiling and running untrusted, arbitrary code in
Amazon EC2
Would you like to compile and run C code from random
people on the Internet on your servers?
85. What We Built: GrID
Patrick Hoesly (https://www.flickr.com/photos/zooboing/5665221326/) CC-BY-2.0
• Service + architecture for grading
programming assignments
• Builds on Amazon ECS and Iguazú
• Named for Tron’s “digital frontier”
• Backronym: Grading Inside Docker