More Related Content Similar to GAM307_Ubisoft How For Honor Runs Using Amazon ECS (20) More from Amazon Web Services (20) GAM307_Ubisoft How For Honor Runs Using Amazon ECS1. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
AWS re:INVENT
Ubisoft: How For Honor Runs Using
Amazon ECS
R a l f M u e l l e r
L o u i s - M i c h e l G é l i n a s
N o v e m b e r 2 7 , 2 0 1 7
GAM307
2. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Introductions
Ralf Mueller
Online Technical Architect
For Honor
Ubisoft Montréal
Louis-Michel Gélinas
DevOps Team Lead
Game Online Operations
Ubisoft Montréal
Special thanks to our teams!
3. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
4. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
First closed alpha for Ubisoft
Biggest open beta for Ubisoft—6 million players
5. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
For Honor “Tribute” trailer
6. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
For Honor: The Journey
• Trailblaze!
• The core beliefs
• When to beautify!
• Bridges and tunnels
7. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Fail fast! Succeed consistently!
8. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Fail fast! Succeed consistently!
9. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
The For Honor core beliefs
Fail fast! Succeed consistently!
Development ease
Automation
Managed infrastructure
10. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
A beginning
• Limited cloud experience
• Desire to leverage cloud advantages (elasticity, managed services)
• Buy-in from the project
• Limited support from internal partners
• Small team with other tasks
• No option of full continuous delivery because of console constraints
• On-premises systems not ready to interact with off-premises systems
11. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
For Honor production block diagram
Backend ECS
Amazon
CloudFront
Application
Load Balancer
S3 Faction War
World State
ElastiCache
(REDIS)
AWS
Lambda
Amazon
Elasticsearch
Service
All traffic over
the public
Internet
Game clients
On-premises DC
Front-end
ECS
Service
discovery ECS
Ancillary ECS
Supporting services
Application
Load Balancer
Front-end
ECS
Backend ECS
Game clients
12. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Getting in shape
“Hello World” service
• Play application
• AWS Elastic Beanstalk
• Couchbase data layer
• Provisioning using a shell
script
• Validation of the tech and
methods
#Create Elastic Beanstalk Environment Template
aws elasticbeanstalk create-configuration-template --application-name
${APPLICATION_NAME} --template-name ${APPLICATION_NAME}-template --solution-stack-name
"64bit Amazon Linux 2014.09 v1.2.0 running Docker 1.3.3" --option-settings
OptionName=InstanceType,Namespace=aws:autoscaling:launchconfiguration,Value=${INSTANCE_T
YPE}
OptionName=IamInstanceProfile,Namespace=aws:autoscaling:launchconfiguration,Value=aws-
elasticbeanstalk-ec2-role
OptionName=EC2KeyName,Namespace=aws:autoscaling:launchconfiguration,Value=${SSH_KEY_NAME
}
OptionName=EnvironmentType,Namespace=aws:elasticbeanstalk:environment,Value=${ENVIRONMEN
T_TYPE} OptionName=VPCId,Namespace=aws:ec2:vpc,Value=vpc-58f2bb3d
OptionName=Subnets,Namespace=aws:ec2:vpc,Value=subnet-9739b2e0
OptionName=AssociatePublicIpAddress,Namespace=aws:ec2:vpc,Value=false
#Create Elastic Beanstalk Environment from Template
aws elasticbeanstalk create-environment --application-name ${APPLICATION_NAME} --
environment-name ${APPLICATION_NAME}-${ENVIRONMENT_NAME} --template-name
${APPLICATION_NAME}-template --cname-prefix ${APPLICATION_NAME}-${ENVIRONMENT_NAME}
#Wait a little moment for Amazon to process environment creation request
sleep 300; #should be fixed with proper status checks through AWS API
#Modify associated security group to restrict access to the newly created application
AWS_SECURITY_GROUP=$(aws ec2 describe-security-groups --filters Name=tag-
value,Values=${APPLICATION_NAME}-${ENVIRONMENT_NAME} | awk '/GroupId/ {gsub(""",
"");print $2}')
aws ec2 authorize-security-group-ingress --group-id ${AWS_SECURITY_GROUP} --protocol tcp
--port 22 --cidr xxx.xxx.xxx.0/20
aws ec2 authorize-security-group-ingress --group-id ${AWS_SECURITY_GROUP} --protocol tcp
--port 80 --cidr xxx.xxx.xxx.0/20
aws ec2 authorize-security-group-ingress --group-id ${AWS_SECURITY_GROUP} --protocol tcp
--port
13. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
• Play framework not a good fit
• Elastic Beanstalk too managed
• Couchbase not a good fit
A dead end
Automation
Managed infrastructure
Development ease
Fail fast! Succeed consistently!
14. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Planning the route
• GO! 1.5 years before launch
• Two non-mission critical
services
• Faction War
• Player information
enrichment (PIE)
• Immutable Docker images
• Minimize resources for
development
• Run everything local
• Namespace databases
• Multiple UAT environments
• Scale out for production
15. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Manual AWS CloudFormation setup for basics
• VPC
• ECS clusters
• Security groups
• ElastiCache instances
• Elasticsearch clusters
ECS task and service management using Python scripts
• Emulates a human running aws-cli commands
Setting out at dawn
Vertical slice
16. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Manual AWS CloudFormation setup for basics
- Depends on documentation
- Fear-driven opposition to change
- Manual tweaks untracked
ECS task and service management using Python
scripts
- Scripted parts depend on manual setup
- Hard to orchestrate (no rollback, golden path
only)
Setting out at dawn
First success retrospective
Automation
Managed infrastructure
Development ease
Fail fast! Succeed consistently!
17. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
• Monitoring and alerting with DataDog
• Logs and metrics
• Load tests
• Track key KPIs
The last mile
18. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
From trail to autobahn
ECS boot sequence fragility
600 ALBs are not practical!
Instance cycling automation
Automate AWS CloudFormation
Proxies vs. tunnels vs. Internet
19. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
ECS boot 101
Front-end ASG Backend ASG
ECS Agent ECS Agent
ECS Agent ECS Agent
20. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Registration and autokill
1. yum install –y jq aws-cli
2. Get instance details
3. Register instance in OpsWorks
4. Set up ECS and start agent
• All steps above can fail
• Retries with timeout
• After 5 minutes: auto-terminate
- Step one must not fail!
- We scan clusters: Is ASG instance count equal to cluster instance count?
ECS boot sequence
21. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Not all are created equal
UAT PROD
100s of ECS services in 2–5 clusters
100-150 environments (changing weekly)
10s of services on 2 clusters
3 static environments (PS4, Xbox One, PC)
HAProxy/Route 53 routing
- Single node
- Deployment latency
ALBs for scale and reliability
+ Multi-node
+ Seamless deployments
- IP hungry
22. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Not all are created equal
ALB in PROD Route 53 as opposed to HAProxy in UAT
• 600+ ALBs not practical
• HAProxy container running on every instance (OpsWorks provisioned)
• Scan instance for running services every minute
• Check for new services
• Update Route 53 entries
• Update local HAProxy config
• Route host-header to local container port
23. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
frontend http-in
timeout client 1m
bind *:80 name http
bind *:443 name https ssl crt /etc/ssl/cert/acme_com_cert.pem
acl host_MGW-Team-HERO-PC-UAT-X hdr(host) -i MGW-Team-HERO-PC-UAT-X.acme.com
use_backend MGW-Team-HERO-PC-UAT-X if host_MGW-Team-HERO-PC-UAT-X
backend MGW-Team-HERO-PC-UAT-X
balance leastconn
timeout connect 10s
timeout server 1m
option httpclose
option forwardfor
cookie JSESSIONID prefix
server a3dbf14d3a92 172.17.0.9:12551 cookie A check
Repeat for each container on the host (1-40)
Not all are created equal
24. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Reinvoke
Check
Security updates
Amazon
SNS
Lambda
function
Terminating
Set to drainingComplete hook
25. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Security updates
Lambdas fill gaps in offering
• Tag instances with information needed
• Cluster name
• Sleep Lambda before re-invoking (or suffer throttling)
• Don't repeat calls—add results to SNS message
• Inspiration came from this AWS blog post:
https://aws.amazon.com/blogs/compute/how-to-automate-container-instance-
draining-in-amazon-ecs/
26. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
• Manual is scary
• Store AWS CloudFormation stacks in
Git
• Gitlab CI jobs triggering updates
• Benefit from stack updates (rollback)
• Promote changes from DEV toward
PROD safely
Automate AWS CloudFormation stacks
27. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
• ECS services and tasks as AWS CloudFormation stacks
• Python code generates stack from template and configs
• Triggers stack update
• Benefit from stack updates (rollback)
Automate AWS CloudFormation stacks
28. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Tunnel vs. proxy vs. internet
UAT on-premises endpoints are not on the public internet
VPC/VPN: Reach internal on-premises endpoints using the VPN
- VPN can flap; it must be monitored
- VPN can become a bottleneck (unsuited for high traffic)
- You need a special DNS configuration to use an on-premises DNS to resolve
private domains
+ Works for us
29. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Tunnel vs. proxy vs. internet
UAT on-premises endpoints are not on the public internet
Proxies: Whitelist two Elastic IPs and allow traffic from these to reach protected
endpoints
- Need to manage proxies
- Need to whitelist IPs in corporate firewall
+ Worked for other projects
LIVE uses public endpoints on the Internet
30. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
For Honor UAT block diagram
Backend ECS
Amazon
CloudFront
S3 Faction War
World State
ElastiCache
(REDIS)
AWS
Lambda
Amazon
Elasticsearch
Service
Game Clients
On-premises data center
Front-end
ECS
Service
Discovery ECS
Ancillary ECS
Supporting services
VPN
Tunnel
HAProxy +
Route 53
31. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Looking back after a break
Automation
Managed infrastructure
Development ease
Fail fast! Succeed consistently!
32. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Lessons learned
• Everything manual is a risk
• Break-even point of automation changes over time
• Validate all changes in noncritical but identical setups
• AWS CloudFormation can have a mind of its own
• Service containers are hard to manage (even with placement constraints)
• Surprising gaps in offerings: Lambdas can duct tape a lot of features cheaply
• Cheap in dev and operations
• Invest in Lambda CI/CD tooling; it can get messy
• Use managed services (Elasticsearch, ElastiCache, SQS, Lambda, etc.)
33. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.