Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

(DVO312) Sony: Building At-Scale Services with AWS Elastic Beanstalk

2,319 views

Published on

Learn about Sony's efforts to build a cloud-native authentication and profile management platform on AWS. Sony engineers demonstrate how they used AWS Elastic Beanstalk (Elastic Beanstalk) to deploy, manage, and scale their applications. They also describe how they use AWS CloudFormation for resource provisioning, Amazon DynamoDB for the main database, and AWS Lambda and Amazon Redshift for log handling and analysis. This discussion focuses on best practices, security considerations, tradeoffs, and final architecture and implementation. By the end of the session, you will clearly understand how to use Elastic Beanstalk as a platform to quickly and easily build at-scale web application on AWS, and how to use Elastic Beanstalk with other AWS services to build cloud-native applications.

Published in: Technology

(DVO312) Sony: Building At-Scale Services with AWS Elastic Beanstalk

  1. 1. © 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Sumio Okada, Engineer, Sony Shinya Kawaguchi, Engineer, Sony October 2015 DVO 312 Building At-Scale Services with AWS Elastic Beanstalk Build a Cloud-native Authentication and Profile Management Platform on AWS
  2. 2. What to expect from the session You will learn how to use AWS Elastic Beanstalk: • As a platform to easily build customized web application at scale on AWS. • To seamlessly build cloud-native applications with other AWS services.
  3. 3. Agenda • Introduction • Architecture • Implementation • Conclusion
  4. 4. Introduction
  5. 5. Who are we? We provide cloud solutions for Sony products and applications. TV Side View Smart Tennis Sensor Smart B-Trainer Play Memories Online
  6. 6. Previous platform An incident
  7. 7. Previous platform • Built on the top of IaaS • Self managed ‘base services’ • Monolithic system
  8. 8. Motivation of rebuild • Agility • Robustness • Efficiency
  9. 9. Achievement - agility BeforeItem Deployment time Half a day 40 Min. Zero downtime release Release trouble rate 30% 0% After Release interval Bi-weekly NA (on demand)
  10. 10. Achievement - robustness Before AfterItem Access surges impact Unstable or down No impact IaaS trouble impact Service damage No impact Emergency operation Auto recover/healing Related service down Affecting an entire system Minimum impact
  11. 11. Achievement - efficiency Before AfterItem Config management Manual Git (Infrastructure as Code) 7+ self-managed services 0Infra for management Scaling Not flexible Auto Scaling
  12. 12. Architecture
  13. 13. Auth & Profile Mutually independent microservices Service Providers Frontend Backend Third party Authentication Services
  14. 14. Service Providers Third party Authentication Services Backend Authentication and profile management system Frontend Auth & Profile
  15. 15. System overview Authentication and profile management system - 1 Public PublicPrivatePublic PrivatePublic AZ-2 us-west2 AZ-1 NAT NAT HA Service Providers NATAPI NATAPI S3 Data Pipeline Batch EC2 Resource Batch Config Log Backup Profile DB DynamoDB API Call DynamoDB/S3 Route53 Third party Authentication Services
  16. 16. System overview Authentication and profile management system - 2 Public PublicPrivatePublic PrivatePublic AZ-2 us-west2 Route53 AZ-1 S3 Service Providers API Call DynamoDB/S3 Data Pipeline Batch EC2 Resource NAT NATAPI NATAPI NAT Batch Config Log Backup Profile DB DynamoDB HA Third party Authentication Services
  17. 17. us-west2 System overview – CloudFormation Base layer Public PublicPrivatePublic PrivatePublic AZ-2 AZ-1 S3 NAT NAT Profile DB Dynamo DB CloudFormation HA
  18. 18. Public PublicPrivatePublic PrivatePublic AZ-2 us-west2 AZ-1 S3 NAT NAT Profile DB Dynamo DB HA System overview - Elastic Beanstalk Application layer Elastic Beanstalk NATAPI NATAPI
  19. 19. Continuous delivery system Code Repository Development Push Code 3 Build Kick off 4 Unit Test 5 Push Image 6 Provision & Deploy 7 Sanity Test Result Delivery system without self-managed infrastructure 1 2 3 4 6 7 8 Development QA5 Integration Test5 Get Image Production
  20. 20. Throttling and Circuit Breaker Self-defense for robustness Throttling Circuit Breaker APIs Throttling Circuit Breaker Third party Authentication Services
  21. 21. Zero-management infrastructure EC2 Cloud Watch, Logs SNS S3 Lambda Redshift Targets Monitoring Metrics Notification / Communication Log Analysis Logs Import Logs, Metrics
  22. 22. Implementation
  23. 23. Authentication& ProfileManagement Platform Implementation - motivation Reproducible Scalable Highly available and fault tolerant Secure and robust Transparent
  24. 24. Authentication& ProfileManagement Platform Implementation - motivation Reproducible Scalable Highly available and fault tolerant Secure and robust Transparent
  25. 25. Infrastructure as code • Automated operations • Version control • Continuous delivery
  26. 26. Infrastructure as code • Versioning: • CloudFormation templates • Elastic Beanstalk configuration files (.ebextensions/*.config) • Application/environment configuration files • Automation scripts
  27. 27. Authentication& ProfileManagement Platform Implementation - motivation Reproducible Scalable Highly available and fault tolerant Secure and robust Transparent
  28. 28. Auto Scaling based on custom metric • Custom Metric via Data Pipeline AppApp Alarms ELB Metrics ELB Metrics CloudWatch Data Pipeline Auto Scaling group Custom Metric (Successful Response Rate per Instance)
  29. 29. Auto Scaling based on custom metric • Custom scaling policies via .ebextensions Resources: AutoScalingScaleOutPolicy: Type: AWS::AutoScaling::ScalingPolicy Properties: AdjustmentType: ChangeInCapacity AutoScalingGroupName: { "Ref" : "AWSEBAutoScalingGroup" } ScalingAdjustment: 2 AutoScalingScaleOutAlarm: Type: AWS::CloudWatch::Alarm Properties: Namespace: { "Fn::GetOptionSetting" : { "OptionName" : "AutoScalingMetricNamespace" } } MetricName: { "Fn::GetOptionSetting" : { "OptionName" : "AutoScalingMetricName" } } Dimensions: [ { "Name" : "LoadBalancerName", "Value" : { "Ref" : "AWSEBLoadBalancer" } } ] ... AlarmActions: [ { "Ref" : "AutoScalingScaleOutPolicy" } ]
  30. 30. Auto Scaling based on custom metric Disable default scaling policies via .ebextensions Resources: AWSEBCloudwatchAlarmHigh: Type: AWS::CloudWatch::Alarm Properties: AlarmActions: [ { "Ref" : "AWS::NoValue" } ] AWSEBCloudwatchAlarmLow: Type: AWS::CloudWatch::Alarm Properties: AlarmActions: [ { "Ref" : "AWS::NoValue" } ]
  31. 31. Authentication& ProfileManagement Platform Implementation - motivation Reproducible Scalable Highly available and fault tolerant Secure and robust Transparent
  32. 32. High availability for application • Zero downtime deployment • Auto healing based on deep health check • Disk space shortage prevention
  33. 33. Zero downtime deployment Auto Scaling group • Rolling deployments • Update application instances one by one Batch Batch Batch App Working App Working App Working
  34. 34. Zero downtime deployment Auto Scaling group • Rolling deployments • Update application instances one by one Batch Batch Batch App Working App Working App Updating
  35. 35. Zero downtime deployment • Rolling deployments via .ebextensions option_settings: "aws:elasticbeanstalk:command": BatchSizeType: Fixed BatchSize: 1
  36. 36. Zero downtime deployment Conflict between rolling deployments and scaling out • Taken care of by Elastic Beanstalk
  37. 37. Zero downtime deployment • Rolling updates • Dynamic batch size Auto Scaling group MinSize 2 MaxSize 10 Batch Batch App Working App Working App Working App Working Increased by scaling out
  38. 38. Zero downtime deployment • Rolling updates • Keep the number of in-service instances Auto Scaling group MinSize 2 MaxSize 10 Batch Batch App Working App Working App Working App Working New Launching New Launching
  39. 39. Zero downtime deployment • Rolling updates • Keep the number of in-service instances Auto Scaling group MinSize 2 MaxSize 10 BatchApp Working App Working New Launching New Launching BatchNew Working New Working App Terminating App Terminating
  40. 40. Zero downtime deployment • Rolling updates via .ebextensions option_settings: "aws:autoscaling:updatepolicy:rollingupdate": RollingUpdateEnabled: true MaxBatchSize: <num of running instances> / 2 # eg.) 2 MinInstancesInService: <num of running instances> # eg.) 4
  41. 41. Zero downtime deployment Tradeoff • Rolling deployments/updates Definite app version switching Low tolerance to deployment failure (rolling deployments)
  42. 42. Zero downtime deployment Tradeoff • Rolling deployments/updates Definite app version switching Low tolerance to deployment failure (rolling deployments) • CNAME swap High tolerance to deployment failure DNS propagation
  43. 43. Zero downtime deployment Tradeoff • Rolling deployments/updates Definite app version switching Low tolerance to deployment failure (rolling deployments) • CNAME swap High tolerance to deployment failure DNS propagation
  44. 44. Auto healing based on deep health check • Deep health check • Accuracy of system time • Accessibility to main database (DynamoDB)
  45. 45. Auto healing based on deep health check • Deep health check configuration via .ebextensions option_settings: "aws:elasticbeanstalk:application": "Application Healthcheck URL": /1/status "aws:elb:healthcheck": Interval: 15 Timeout: 10 HealthyThreshold: 3 UnhealthyThreshold: 3
  46. 46. Auto healing based on deep health check • Auto healing configuration via .ebextensions Resources: AWSEBAutoScalingGroup: Type: AWS::AutoScaling::AutoScalingGroup Properties: HealthCheckType: ELB
  47. 47. Auto healing based on deep health check Rolling deployments with auto healing configuration Problem • Unexpected instance termination caused by Elastic Beanstalk
  48. 48. Auto healing based on deep health check Rolling deployments with auto healing configuration Problem • Unexpected instance termination caused by Elastic Beanstalk Workaround • Suspend HealthCheck process in AWSEBAutoScalingGroup during rolling deployments
  49. 49. Disk space shortage prevention • Docker image local cache size 0% 20% 40% 60% 80% 100% 1 2 … n Free Docker Image Local Cache System Rolling Deployments DiskUsage Pulling new layers
  50. 50. Disk space shortage prevention • Remove unused Docker images via .ebextensions files: "/opt/elasticbeanstalk/hooks/appdeploy/post/99_01_remove-unused-docker-images.sh": mode: "000755" owner: root group: root content: | #!/bin/bash docker images | grep -v "aws_beanstalk/" | grep -v "REPOSITORY" | xargs -I {} /bin/bash -c ' repository=$(echo "{}" | awk "{ print $1 }") tag=$(echo "{}" | awk "{ print $2 }") image_id=$(echo "{}" | awk "{ print $3 }") docker rmi $image_id || docker tag $image_id $repository:$tag || true ' || true
  51. 51. Disk space shortage prevention • Docker container log size • Container logs captured by Elastic Beanstalk • /var/log/eb-docker/containers/eb-current-app/*-stdouterr.log • Original container logs • /var/lib/docker/containers/<cid>/<cid>-json.log
  52. 52. Disk space shortage prevention • Docker container log size • Container logs captured by Elastic Beanstalk Rotated • Original container logs Keeps growing in size
  53. 53. Disk space shortage prevention • Docker container logs truncation via .ebextensions files: "/etc/cron.hourly/cron.logtruncate.docker.json.log.conf": mode: "000755" owner: root group: root content: | #!/bin/sh # truncate docker container logs here. # see appendix for the actual script implementation. ...
  54. 54. High availability for NAT • NAT instance in AutoScalingGroup • Periodic route table monitoring
  55. 55. NAT instance in AutoScalingGroup • Static resources created via CloudFormation Public Subnet Public Subnet Private Subnet for Apps Private Subnet for Apps AZ-2 AWS Region AZ-1 tag:NetworkSegment NAT-A tag:NetworkSegment NAT-B Internet MinSize 1 MaxSize 1 MinSize 1 MaxSize 1
  56. 56. NAT instance in AutoScalingGroup • Dynamic NAT instances Public Subnet Public Subnet Private Subnet for Apps Private Subnet for Apps AZ-2 AWS Region AZ-1 NAT Pending NAT Pending tag:NetworkSegment NAT-A Public IP Internet tag:NetworkSegment NAT-B Public IP tag:NetworkSegment NAT-A tag:NetworkSegment NAT-B AutoScalingGroup launches new NAT instance.
  57. 57. NAT instance in AutoScalingGroup • Dynamic NAT instance configuration via cloud-init Public Subnet Public Subnet Private Subnet for Apps Private Subnet for Apps AZ-2 AWS Region AZ-1 NAT Running NAT Running tag:NetworkSegment NAT-A Elastic IP Internet tag:NetworkSegment NAT-B Elastic IP tag:NetworkSegment NAT-A tag:NetworkSegment NAT-B Disable SRC/DST check, Assign Elastic IP, etc...
  58. 58. NAT instance in AutoScalingGroup • Route table lookup Public Subnet Public Subnet Private Subnet for Apps Private Subnet for Apps AZ-2 AWS Region AZ-1 NAT Running NAT Running Internet New NAT Instance looks up route tables based on tag. tag:NetworkSegment NAT-A tag:NetworkSegment NAT-B tag:NetworkSegment NAT-A Elastic IP tag:NetworkSegment NAT-B Elastic IP
  59. 59. NAT Instance in AutoScalingGroup • Dynamic route configuration Public Subnet Public Subnet Private Subnet for Apps Private Subnet for Apps AZ-2 AWS Region AZ-1 NAT Running NAT Running tag:NetworkSegment NAT-A tag:RoutingStatus OK tag:NetworkSegment NAT-B tag:RoutingStatus OK Internet tag:NetworkSegment NAT-A Elastic IP tag:NetworkSegment NAT-B Elastic IP
  60. 60. Periodic route table monitoring • Running normally Public Subnet Public SubnetPrivate Subnet Private Subnet AZ-2 AWS Region AZ-1 NAT Running NATApp NATApp NAT Running tag:NetworkSegment NAT-A tag:RoutingStatus OK tag:NetworkSegment NAT-B tag:RoutingStatus OK 0.0.0.0/0 Active tag:NetworkSegment NAT-A Internet 0.0.0.0/0 Active tag:NetworkSegment NAT-B NAT Instances monitor route tables located in different AZs periodically.
  61. 61. Periodic route table monitoring • Black hole route detection Public Subnet Public SubnetPrivate Subnet Private Subnet AZ-2 AWS Region AZ-1 NAT Terminated NATApp NATApp NAT Running tag:NetworkSegment NAT-A tag:RoutingStatus OK tag:NetworkSegment NAT-B tag:RoutingStatus OK 0.0.0.0/0 Black Hole tag:NetworkSegment NAT-A Internet 0.0.0.0/0 Active tag:NetworkSegment NAT-B Healthy NAT Instance detects blackhole internet route.
  62. 62. AWS Region Periodic route table monitoring • Outbound traffic takeover Public Subnet Public SubnetPrivate Subnet Private Subnet AZ-2 AZ-1 NAT Terminated NATApp NATApp NAT Running tag:NetworkSegment NAT-A tag:RoutingStatus TakenOver tag:NetworkSegment NAT-B tag:RoutingStatus OK Internet 0.0.0.0/0 Active Healthy NAT Instance takes over outboud traffic to internet. tag:NetworkSegment NAT-A tag:NetworkSegment NAT-B
  63. 63. AWS Region Periodic route table monitoring • Outbound traffic takeover Public Subnet Public SubnetPrivate Subnet Private Subnet AZ-2 AZ-1 NAT Terminated NATApp NATApp NAT Running tag:NetworkSegment NAT-A tag:RoutingStatus TakenOver tag:NetworkSegment NAT-B tag:RoutingStatus OK Internet 0.0.0.0/0 Active NAT Pending tag:NetworkSegment NAT-A AutoScalingGroup launches new NAT instance. tag:NetworkSegment NAT-B
  64. 64. AWS Region Periodic route table monitoring • Route table lookup Public Subnet Public SubnetPrivate Subnet Private Subnet AZ-2 AZ-1 NAT Terminated NATApp NATApp NAT Running tag:NetworkSegment NAT-A tag:RoutingStatus TakenOver tag:NetworkSegment NAT-B tag:RoutingStatus OK Internet 0.0.0.0/0 Active NAT Running tag:NetworkSegment NAT-A tag:NetworkSegment NAT-B New NAT Instance looks up route tables based on tag.
  65. 65. AWS Region Periodic route table monitoring • Outbound traffic recovery Public Subnet Public SubnetPrivate Subnet Private Subnet AZ-2 AZ-1 NAT Terminated NATApp NATApp NAT Running tag:NetworkSegment NAT-A tag:RoutingStatus OK tag:NetworkSegment NAT-B tag:RoutingStatus OK tag:NetworkSegment NAT-B Internet 0.0.0.0/0 Active NAT Running tag:NetworkSegment NAT-A New NAT Instance recovers internet route. 0.0.0.0/0 Active
  66. 66. Periodic route table monitoring Network capacity planning for NAT instances • Need to consider total amount of outbound traffic coming from application instances across Availability Zones
  67. 67. Authentication& ProfileManagement Platform Implementation - motivation Reproducible Scalable Highly available and fault tolerant Secure and robust Transparent
  68. 68. Source IP address whitelisting • Without whitelisting AWSEBLoadBalancerSecurityGroup No Inbound Rules App App App x.x.x.1 x.x.x.6x.x.x.5 Applied by Elastic Beanstalk AWSEBLoadBalancer
  69. 69. Source IP address whitelisting • With whitelisting ip-whitelist-group1-1 HTTPS TCP 443 x.x.x.1/32 … AWSEBLoadBalancerSecurityGroup No Inbound Rules ip-whitelist-group1-2 HTTPS TCP 443 x.x.x.2/32 ip-whitelist-group1-3 HTTPS TCP 443 x.x.x.3/32 ip-whitelist-group1-4 HTTPS TCP 443 x.x.x.4/32 Configuration files tag:IPWhitelistGroup DefaultGroup tag:IPWhitelistGroup Group1 tag:IPWhitelistGroup Group1 App App App x.x.x.1 x.x.x.6 Rules Rules Rules Rules x.x.x.5 Applied via script SecurityGroups Max 200 (4*50) rules are available AWSEBLoadBalancer Add rules via script
  70. 70. Source IP address whitelisting • Tagging built-in resources via .ebextensions Resources: AWSEBLoadBalancer: Type: AWS::ElasticLoadBalancing::LoadBalancer Properties: Tags: - { Key: IPWhitelistGroup, Value: Group1 } AWSEBLoadBalancerSecurityGroup: Type: AWS::EC2::SecurityGroup Properties: GroupDescription: "Load Balancer Security Group" VpcId: { "Fn::GetOptionSetting" : { "OptionName" : "VPCId" } } Tags: - { Key: IPWhitelistGroup, Value: DefaultGroup }
  71. 71. Source IP address whitelisting Fill required properties in security group for ELB via .ebextensionsResources: AWSEBLoadBalancer: Type: AWS::ElasticLoadBalancing::LoadBalancer Properties: Tags: - { Key: IPWhitelistGroup, Value: Group1 } AWSEBLoadBalancerSecurityGroup: Type: AWS::EC2::SecurityGroup Properties: GroupDescription: "Load Balancer Security Group" VpcId: { "Fn::GetOptionSetting" : { "OptionName" : "VPCId" } } Tags: - { Key: IPWhitelistGroup, Value: DefaultGroup } Specifying GroupDescription and VpcId is also required in order to modify AWSEBLoadBalancerSecurityGroup resource via .ebextensions.
  72. 72. Connection/request throttling • Throttling per client (source IP address) Amazon Linux Docker Container App APIs Internal Service External Services Over Limit Over Limit Third party Authentication Services
  73. 73. Internal Service Connection/request throttling • Throttling per remote user (internal service) Amazon Linux Docker Container External ServicesOver Limit Over Limit Internal Service App APIs Third party Authentication Services
  74. 74. Connection/request throttling • nginx configuration file installation via .ebextensions files: "/etc/nginx/throttling/limit-zone-def.conf": mode: "000644" owner: root group: root content: | # include in http context limit_conn_zone $http_x_forwarded_for zone=conn_perclient:10m; limit_conn_zone $hostname zone=conn_total:1m; limit_conn_status 429; limit_req_zone $remote_user zone=req_perservice:10m rate=150r/s; limit_req_zone $hostname zone=req_total:1m rate=200r/s; limit_req_status 429;
  75. 75. Connection/request throttling • nginx configuration file installation via .ebextensions files: "/etc/nginx/throttling/limit-per.conf": mode: "000644" owner: root group: root content: | # include in location context limit_conn conn_perclient 75; limit_req zone=req_perservice burst=300 nodelay;
  76. 76. Connection/request throttling • nginx configuration file installation via .ebextensions files: "/etc/nginx/throttling/limit-total.conf": mode: "000644" owner: root group: root content: | # include in location context limit_conn conn_total 300; limit_req zone=req_total burst=400 nodelay;
  77. 77. Connection/request throttling • nginx configuration script (.ebextensions/nginx-conf.sh) #!/bin/bash EB_CONFIG_HTTP_PORT=$(/opt/elasticbeanstalk/bin/get-config container -k instance_port) cat > /etc/nginx/sites-available/nginx-docker-proxy.conf <<EOF ... include throttling/limit-zone-def.conf; server { listen $EB_CONFIG_HTTP_PORT; location / { ... include throttling/limit-per.conf; include throttling/limit-total.conf; } location ~ /.+?/status { ... include throttling/limit-per.conf; } } EOF rm -f /etc/nginx/sites-enabled/* ln -sf /etc/nginx/sites-available/nginx-docker-proxy.conf /etc/nginx/sites-enabled/
  78. 78. Connection/request throttling • nginx configuration via .ebextensions container_commands: nginx-conf-for-throttling: command: 'bash .ebextensions/nginx-conf.sh'
  79. 79. Connection/request throttling Tradeoff Advantages taken from throttling Low compatibility
  80. 80. External Services Internal Services Circuit Breaker • Proxy object for each external service Amazon Linux Docker Container App Open Closed Closed Closed APIs Immediate failure Third party Authentication Services
  81. 81. Authentication& ProfileManagement Platform Implementation - motivation Reproducible Scalable Highly available and fault tolerant Secure and robust Transparent
  82. 82. Comprehensive log monitoring Cloud Watch, Logs SNS S3 Lambda Redshift Targets Monitoring Metrics Notification / Communication Log Analysis Logs Import Logs, Metrics AppNAT
  83. 83. Comprehensive log monitoring • LogGroup creation via .ebextensions Resources: CWLSyslogMessagesLogGroup: Type: "AWS::Logs::LogGroup" DependsOn: AWSEBBeanstalkMetadata Properties: LogGroupName: { "Fn::Join" : [ "-", [ { "Ref" : "AWSEBEnvironmentName" }, "syslog-messages" ] ] } RetentionInDays: 14
  84. 84. Comprehensive log monitoring • CloudWatch Logs agent config file via .ebextensions Resources: AWSEBAutoScalingGroup: Metadata: "AWS::CloudFormation::Init": CWLogsAgentConfigSetup: files: "/tmp/cwlogs/conf.d/core-logs.conf": content : | [/var/log/messages] file = /var/log/messages log_group_name = `{ "Ref" : "CWLSyslogMessagesLogGroup" }` log_stream_name = {instance_id} datetime_format = %b %d %H:%M:%S
  85. 85. Notification / Communication Searchable log retention Cloud Watch, Logs SNS S3 Lambda Redshift Targets Monitoring Metrics Log Analysis Import Logs, Metrics AppNAT Logs
  86. 86. Notification / Communication Searchable log retention Cloud Watch, Logs SNS S3 Lambda Redshift Targets Monitoring Metrics Log Analysis Import Logs, Metrics AppNAT flush_interval 60s flush_at_shutdown true Logs
  87. 87. Searchable log retention • td-agent configuration via .ebextensions files: "/etc/sysconfig/td-agent": mode: "000644" owner: root group: root content: | # Run as root user TD_AGENT_ARGS="/usr/sbin/td-agent --group td-agent --log /var/log/td-agent/td-agent.log --use-v1-config --suppress-repeated-stacktrace" DAEMON_ARGS="--user root“ commands: 01-prepare-installer: command: ... # Install td-agent installation script to /tmp/td-agent/install-td-agent-v2.sh 02-run-installer-td-agent: command: bash /tmp/td-agent/install-td-agent-v2.sh 03-setup-configration: command: ... # Configure log sources for td-agent 04-restart-td-agent: command: service td-agent restart
  88. 88. Searchable log retention • Enable ELB to upload access logs to Amazon S3 Resources: AWSEBLoadBalancer: Type: AWS::ElasticLoadBalancing::LoadBalancer Properties: AccessLoggingPolicy: S3BucketName: { "Fn::GetOptionSetting" : { "OptionName" : "LogsBucketName" } } S3BucketPrefix: "elb" Enabled: true EmitInterval: 5 # minutes
  89. 89. Conclusion
  90. 90. Challenges and expectations • Compatibility • Ease of operation test
  91. 91. Trouble-less eight months in production with Elastic Beanstalk • Flexibility Satisfy customization needs • Reliability No major problems • Simplicity Simplified DevOps
  92. 92. Thank you!
  93. 93. Question and answer
  94. 94. Remember to complete your evaluations!
  95. 95. Appendix
  96. 96. Sony open source software • gobreaker • Go implementation of circuit breaker • Available on GitHub • https://github.com/sony/gobreaker • Feel free to submit pull requests and raise issues on the GitHub project
  97. 97. Sony open source software • Sonyflake • Go implementation of distributed unique ID generator • Available on GitHub • https://github.com/sony/sonyflake • Small utility for AWS (VPC) included • Example running on EB provided • Feel free to submit pull requests and raise issues on the GitHub project
  98. 98. Articles • Continuous Delivery with Golang and Docker • https://circleci.com/stories/sony
  99. 99. References • Advanced network automation • (ARC401) Black-Belt Networking for the Cloud Ninja | AWS re:Invent 2014 • Docker container log rotation • https://github.com/docker/docker/issues/7333 • https://docs.docker.com/reference/logging/overview/
  100. 100. Auto Scaling design Scale out timing chart Execute Policy Running In ServiceOut of Service App Startup ELB Determination Health Check Grace Period Deployment In Service Dead Line Resume Auto Scaling EC2 State ELB Instance State Cooldown Period (scale out policy) Register Instance Pending Auto Scaling Timers * in the case of HealthCheckType: ELB
  101. 101. Auto Scaling design Scale out timing parameters Execute Policy Running In ServiceOut of Service App Startup 45 ELB Determination HealthCheck Interval x HealthyThreshold Health Check Grace Period 600 Deployment In Service Dead Line Resume Auto Scaling Margin 300 Margin for Balancing & Metric EC2 State ELB Instance State Cooldown Period (scale out policy) 900 300 avg. 15 3 300 Register Instance Pending Auto Scaling Timers * in the case of HealthCheckType: ELB
  102. 102. Examples • Elastic IP association via cloud-init #!/bin/bash REGION=$1 EIP_ALLOCATION_ID=$2 INSTANCE_ID=$(curl --silent http://169.254.169.254/latest/meta-data/instance-id) while true; do INSTANCE_STATUS=$(aws --region "${REGION}" --output text ec2 describe-instance-status --instance-ids "${INSTANCE_ID}" --filters Name=instance-state-name,Values=running) if [[ $? = 0 && "${INSTANCE_STATUS}" != "" ]]; then aws --region "${REGION}" --output text ec2 associate-address --instance-id "${INSTANCE_ID}" --allocation-id "${EIP_ALLOCATION_ID}" && break fi sleep 5s done
  103. 103. Examples • Elastic IP association via cloud-init • associate-address command fails if the instance is still in pending state • Need to wait for the instance to become running state before executing associate-address command
  104. 104. Examples • Connection draining Keep accepting requests (10~20s) ConnectionDrainingTimeout
  105. 105. Examples • Connection draining via .ebextensions option_settings: "aws:elb:policies": ConnectionDrainingEnabled: true ConnectionDrainingTimeout: 80 # 20 + 60 seconds
  106. 106. Examples • Docker container log truncation #!/bin/sh cidfile=$(/opt/elasticbeanstalk/bin/get-config container -k app_deploy_file) [ ! -r "${cidfile}" ] && exit 0 cid=$(cat "${cidfile}") scid=${cid::12} dockerlog="/var/lib/docker/containers/${cid}/${cid}-json.log" [ ! -w "${dockerlog}" ] && exit 0 # The eb-log file made by Elastic Beanstalk. eblog="/var/log/eb-docker/containers/eb-current-app/${scid}-stdouterr.log" # PID of docker logs command related to the Container-ID. logspids=$(ps aux | grep "docker logs -f ${scid}" | grep -v grep | awk '{print $2}') for logspid in ${logspids} do # Count FD of docker logs related to the eb-log file. eblogfd=$(lsof -p ${logspid} | grep "${eblog}" | wc -l) # Expect to be redirected stdout and stderr to the eb-log file. [ ! ${eblogfd} -eq 2 ] && continue # Now, can truncate the docker-log file. cat /dev/null > ${dockerlog} break done
  107. 107. Examples • Run ntpd in slew mode via .ebextensions files: "/etc/sysconfig/ntpd": mode: "000644" owner: root group: root content: | OPTIONS="-g -x" commands: "ntpd-service-restart": command: service ntpd restart
  108. 108. Examples • Scaling event notification via .ebextensions Resources: AWSEBAutoScalingGroup: Type: AWS::AutoScaling::AutoScalingGroup Properties: HealthCheckType: ELB NotificationConfiguration: TopicARN: { "Fn::GetOptionSetting" : { "OptionName" : “ASGTopicArn" } } NotificationTypes: - autoscaling:EC2_INSTANCE_LAUNCH - autoscaling:EC2_INSTANCE_LAUNCH_ERROR - autoscaling:EC2_INSTANCE_TERMINATE - autoscaling:EC2_INSTANCE_TERMINATE_ERROR
  109. 109. Examples • td-agent installation script #!/usr/bin/env bash Enterprise Linux 7 (releasever is '7') # add GPG key rpm --import http://packages.treasuredata.com/GPG-KEY-td-agent # add treasure data repository to yum cat > /etc/yum.repos.d/td.repo <<EOF [treasuredata] name=TreasureData baseurl=http://packages.treasuredata.com/2/redhat/7/$basearch gpgcheck=1 gpgkey=http://packages.treasuredata.com/GPG-KEY-td-agent EOF # install the toolbelt yum install -y td-agent-2.1.5-1 # install plugins /opt/td-agent/embedded/bin/fluent-gem install --no-document fluent-plugin-tail_path -v "=0.0.3" /opt/td-agent/embedded/bin/fluent-gem install --no-document fluent-plugin-forest -v "=0.3.0" /opt/td-agent/embedded/bin/fluent-gem install --no-document fluent-plugin-add -v "=0.0.3" # this plugin will be no longer required in next td-agent version. /opt/td-agent/embedded/bin/fluent-gem install --no-document fluent-plugin-s3 -v "=0.5.7" # enable service chkconfig td-agent on

×