SlideShare a Scribd company logo
1 of 17
Juan Vicente Herrera Ruiz de Alejo
@jvicenteherrera
About Me
●
@jvicenteherrera
●
juan.vicente.herrera@gmail.com
●
http://www.linkedin.com/in/jvherrera
●
http://juanvicenteherrera.eu
Description
●
Social feed aggregation/recommendation app
●
Client developed by a Global Fortune 500
company that makes video consoles, TVs
and many years ago Walkmans…
●
Expected at the end of 2013 around
1.000.000 new users registered in the
platform and 170.000 DAU
●
All servers are running in AWS and the
deployments and configuration
management are handled by Chef.
System stats
Main Components
– Custom
API(Java)
– Beanstalk
– RabbitMQ
– Redis
– MongoDB
(Sharding)
EC2
– Production env: Reserved
instances for the mininum
configuration. On demand
instances for scale out.
– Staging env: Reserved instances
for ½ day
– Elastic Load Balancers
– Security Groups and ACLs
– Key Pairs per each subnet
– Current EC2 region is US east
Main AWS products used
VPC Subnet
VPC Subnet VPC Subnet VPC Subnet VPC Subnet VPC Subnet
DEV
Stage
APP
Stage
DB
Prod
APP
Stage
DB
DNS
VPN
DEV-
NAT
Public-
Nexus
Public
Git
server
Public-
Chef
Public-
Jenkins
Stage
NAT
Prod
NAT
Prod
NAT
Nagios
forwarder
ELB 1 Web Servers
Stage ELB 1 Web Servers
Prod
Security Group Security Group Security Group Security Group Security Group
Architecture/Infrastructure
APP and DB VPC
VPC Subnet
VPC Subnet
Prod
APP
Prod
DB
Security Group Security Group
Mongodb
Config1
Mongodb
Config2
Mongodb
Config2
Mongodb1
set1
Mongodb2
set1
Mongoarb
set1
Mongodb1
set2
Mongodb2
set2
Mongoarb
set2
MySQL master MySQL slave
LDAP master LDAP slave
ELB1 App1
APP1 servers
App2
master
App2
slave
Solr
master
Solr
slave
Varnish
master
Varnish
slave
Alfresco
APP2
servers
APP3
servers
APP3
servers
Redis Master Redis slave Logs
RabbitMQ
servers
Improvements achieved (I)
●
APIs are state-less so you can scale out very easily. Nodes
are created by Chef(Knife).
●
Fine integration with Chef. Ensure that you have the same
configuration in all of the environments and avoid
misconfigurations in production environment. Chef Bootstrap
ec2 instances works fully integrated with knife.
●
Get a quick and confident way to create an exact production
mirror (staging) environment with Chef and Cloudformation
– Before AWS/Chef → create a staging env took 6 weeks
– After AWS/Chef → create a staging env takes less than 1
day
● Save costs managing non-production environments
– Before AWS/Chef → environments up 24*7
– After AWS/Chef → environments up 8 hours / working
days (scripts in cron which use API Tools)
– Python Script example
● Outage recovery plan handled with nodes snapshots
(MongoDB) or Chef (other nodes stateless)
● Very quick response and customized consulting for the
project provided by Amazon Team.
Improvements achieved (II)
Staging example with dynamic ip (dhcp)
knife ec2 server create -I ami-af71f8c6 -r "role[apache]" -f
m1.medium --region us-east-1 -S scp-staging -i
/Users/juanvi/keypairs/scp-staging.pem -g sg-2418e54b -s
subnet-919cecfc -x ec2-user -N stapp-apache-Test -E staging
Staging example with static ip
ec2-run-instances ami-af71f8c6 -k vpc-public-10-234-1 -g sg-
379e6d58 -s subnet-cb9596a0 -t m1.xlarge --private-ip-
address 10.234.2.204
knife bootstrap 10.234.2.204 -/Users/juanvi/keypairs/scp-
staging.pem -r "role[webserver]" -N STAGING-public-
webserver2 -x ec2-user -E staging --sudo
Example Create a new node
What we have learned
●
Strongly recommended run servers in more than one availability
zone for avoid a total downtime in case of outage
us-east-1a us-east-1d
●For certain services balanced use TCP instead of
HTTP. The balancing of requests to different nodes of
our APIs by TCP internally solved some problems with
HTTP requests without closing sessions. We only use
HTTP balancing for requests that come to the public
Apache.
We noticed that a lot of Apache connections were not
closed properly with HTTP balance mode and in
some hours we reached the limit connections
Solved with TCP balance mode in ELB
What we have learned (II)
●Use Cloudformation to create network
resources automatically.
–Before Cloudformation→ create
one by one all of the resources
–After Cloudformation →create
automatically all the nodes and
network resources of an entire
environment in one execution
–Cloudformation Example
What we have learned (III)
●Analyze performance tests for choose the
minimum number of nodes that will be running
24 * 7 and sizes to reserve instances.
Reserved instances reduce the cost to 2/3.
–Before AWS/Chef→ limits in the
performance tests caused by non
available servers due to their costs. Test
simulated.
–After AWS/Chef →High-powerful
Instances available per use only for
some hours or days with a reduced cost
What we have learned (IV)
●Advisable to use a large number of small
servers instances close to 100% CPU usage,
instead of having few powerful machines with
their resources wasted, and launch new
nodes and balancing requests among them
when load increase.
●Pre balancers warming if you expect a
exponential increase of the requests
●Request to support increasing the initial
limitations of instances that can run on a
simultaneous EC2 (20)
What we have learned (V)
• You must adapt to the size of the instances
whose resources(CPU, RAM...) are predefined
and not customizable
• You have no control over the evolution of the
products that your service depends
• You don't have access to the logs of some
instances (for example load balancers)
• Danger engaging AWS services and consequent
difficulty migrating to another DC.
Things to consider
●
@jvicenteherrera
●
juan.vicente.herrera@gmail.com
●
http://www.linkedin.com/in/jvherrera
●
http://juanvicenteherrera.eu
for your attention

More Related Content

What's hot

Cloud Foundry on OpenStack - An Experience Report | anynines
Cloud Foundry on OpenStack - An Experience Report | anynines Cloud Foundry on OpenStack - An Experience Report | anynines
Cloud Foundry on OpenStack - An Experience Report | anynines
anynines GmbH
 
Infrastructure as code with Terraform
Infrastructure as code with TerraformInfrastructure as code with Terraform
Infrastructure as code with Terraform
Sam Bashton
 

What's hot (20)

Journey to Microservice architecture via Amazon Lambda
Journey to Microservice architecture via Amazon LambdaJourney to Microservice architecture via Amazon Lambda
Journey to Microservice architecture via Amazon Lambda
 
ILM - Pipeline in the cloud
ILM - Pipeline in the cloudILM - Pipeline in the cloud
ILM - Pipeline in the cloud
 
[OpenInfra Days Korea 2018] Day 2 - E5-1: "Invited Talk: Kubicorn - Building ...
[OpenInfra Days Korea 2018] Day 2 - E5-1: "Invited Talk: Kubicorn - Building ...[OpenInfra Days Korea 2018] Day 2 - E5-1: "Invited Talk: Kubicorn - Building ...
[OpenInfra Days Korea 2018] Day 2 - E5-1: "Invited Talk: Kubicorn - Building ...
 
Cloud Foundry on OpenStack - An Experience Report | anynines
Cloud Foundry on OpenStack - An Experience Report | anynines Cloud Foundry on OpenStack - An Experience Report | anynines
Cloud Foundry on OpenStack - An Experience Report | anynines
 
Migrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to AzureMigrating Existing Open Source Machine Learning to Azure
Migrating Existing Open Source Machine Learning to Azure
 
Monitor PowerKVM using Ganglia, Nagios
Monitor PowerKVM using Ganglia, NagiosMonitor PowerKVM using Ganglia, Nagios
Monitor PowerKVM using Ganglia, Nagios
 
Major Managed Kubernetes Platforms Comparison (AWS, GCP, Azure)
Major Managed Kubernetes Platforms Comparison (AWS, GCP, Azure)Major Managed Kubernetes Platforms Comparison (AWS, GCP, Azure)
Major Managed Kubernetes Platforms Comparison (AWS, GCP, Azure)
 
Taking Cloud to Extremes: Scaled-down, Highly Available, and Mission-critical...
Taking Cloud to Extremes: Scaled-down, Highly Available, and Mission-critical...Taking Cloud to Extremes: Scaled-down, Highly Available, and Mission-critical...
Taking Cloud to Extremes: Scaled-down, Highly Available, and Mission-critical...
 
London Hug 19/5 - Terraform in Production
London Hug 19/5 - Terraform in ProductionLondon Hug 19/5 - Terraform in Production
London Hug 19/5 - Terraform in Production
 
Automating aws infrastructure and code deployments using Ansible @WebEngage
Automating aws infrastructure and code deployments using Ansible @WebEngageAutomating aws infrastructure and code deployments using Ansible @WebEngage
Automating aws infrastructure and code deployments using Ansible @WebEngage
 
Running Cloud Foundry for 12 months - An experience report | anynines
Running Cloud Foundry for 12 months - An experience report | anyninesRunning Cloud Foundry for 12 months - An experience report | anynines
Running Cloud Foundry for 12 months - An experience report | anynines
 
[OpenInfra Days Korea 2018] Day 2 - E3-2: "핸즈온 워크샵: Kubespray, Helm, Armada를 ...
[OpenInfra Days Korea 2018] Day 2 - E3-2: "핸즈온 워크샵: Kubespray, Helm, Armada를 ...[OpenInfra Days Korea 2018] Day 2 - E3-2: "핸즈온 워크샵: Kubespray, Helm, Armada를 ...
[OpenInfra Days Korea 2018] Day 2 - E3-2: "핸즈온 워크샵: Kubespray, Helm, Armada를 ...
 
Kubernetes User Group: 維運 Kubernetes 的兩三事
Kubernetes User Group: 維運 Kubernetes 的兩三事Kubernetes User Group: 維運 Kubernetes 的兩三事
Kubernetes User Group: 維運 Kubernetes 的兩三事
 
整合Cloud Foundry 和 Kubernetes 技術打造企業級雲應用平台解決方案
整合Cloud Foundry 和 Kubernetes 技術打造企業級雲應用平台解決方案整合Cloud Foundry 和 Kubernetes 技術打造企業級雲應用平台解決方案
整合Cloud Foundry 和 Kubernetes 技術打造企業級雲應用平台解決方案
 
K8s cluster autoscaler
K8s cluster autoscaler K8s cluster autoscaler
K8s cluster autoscaler
 
Infrastructure as code with Terraform
Infrastructure as code with TerraformInfrastructure as code with Terraform
Infrastructure as code with Terraform
 
(APP310) Scheduling Using Apache Mesos in the Cloud | AWS re:Invent 2014
(APP310) Scheduling Using Apache Mesos in the Cloud | AWS re:Invent 2014(APP310) Scheduling Using Apache Mesos in the Cloud | AWS re:Invent 2014
(APP310) Scheduling Using Apache Mesos in the Cloud | AWS re:Invent 2014
 
Making Spinnaker Go @ Stitch Fix
Making Spinnaker Go @ Stitch FixMaking Spinnaker Go @ Stitch Fix
Making Spinnaker Go @ Stitch Fix
 
Azure cli2.0
Azure cli2.0Azure cli2.0
Azure cli2.0
 
Cloudformation vs terraform_vs_ansible
Cloudformation vs terraform_vs_ansibleCloudformation vs terraform_vs_ansible
Cloudformation vs terraform_vs_ansible
 

Viewers also liked

Viewers also liked (7)

Continuous Integration and Deployment Best Practices on AWS (ARC307) | AWS re...
Continuous Integration and Deployment Best Practices on AWS (ARC307) | AWS re...Continuous Integration and Deployment Best Practices on AWS (ARC307) | AWS re...
Continuous Integration and Deployment Best Practices on AWS (ARC307) | AWS re...
 
Optimizing Total Cost of Ownership for the AWS Cloud
Optimizing Total Cost of Ownership for the AWS CloudOptimizing Total Cost of Ownership for the AWS Cloud
Optimizing Total Cost of Ownership for the AWS Cloud
 
A quick introduction to AWS Kinesis
A quick introduction to AWS KinesisA quick introduction to AWS Kinesis
A quick introduction to AWS Kinesis
 
AWS Webcast - Introduction to Amazon Kinesis
AWS Webcast - Introduction to Amazon KinesisAWS Webcast - Introduction to Amazon Kinesis
AWS Webcast - Introduction to Amazon Kinesis
 
API Management with wicked.haufe.io
API Management with wicked.haufe.ioAPI Management with wicked.haufe.io
API Management with wicked.haufe.io
 
(CMP404) Cloud Rendering at Walt Disney Animation Studios
(CMP404) Cloud Rendering at Walt Disney Animation Studios(CMP404) Cloud Rendering at Walt Disney Animation Studios
(CMP404) Cloud Rendering at Walt Disney Animation Studios
 
Best Practices running SQL Server on AWS
Best Practices running SQL Server on AWSBest Practices running SQL Server on AWS
Best Practices running SQL Server on AWS
 

Similar to AWS migration: getting to Data Center heaven with AWS and Chef

HotLink DR Express
HotLink DR ExpressHotLink DR Express
HotLink DR Express
dean1609
 
AWS Summit Benelux 2013 - Enterprise Applications on AWS
AWS Summit Benelux 2013 - Enterprise Applications on AWSAWS Summit Benelux 2013 - Enterprise Applications on AWS
AWS Summit Benelux 2013 - Enterprise Applications on AWS
Amazon Web Services
 

Similar to AWS migration: getting to Data Center heaven with AWS and Chef (20)

Testing kubernetes and_open_shift_at_scale_20170209
Testing kubernetes and_open_shift_at_scale_20170209Testing kubernetes and_open_shift_at_scale_20170209
Testing kubernetes and_open_shift_at_scale_20170209
 
NGINX Plus on AWS
NGINX Plus on AWSNGINX Plus on AWS
NGINX Plus on AWS
 
HotLink DR Express
HotLink DR ExpressHotLink DR Express
HotLink DR Express
 
Phil Basford - machine learning at scale with aws sage maker
Phil Basford - machine learning at scale with aws sage makerPhil Basford - machine learning at scale with aws sage maker
Phil Basford - machine learning at scale with aws sage maker
 
AWS Elastic Compute Cloud (EC2)
AWS Elastic Compute Cloud (EC2) AWS Elastic Compute Cloud (EC2)
AWS Elastic Compute Cloud (EC2)
 
Machine learning at scale with aws sage maker
Machine learning at scale with aws sage makerMachine learning at scale with aws sage maker
Machine learning at scale with aws sage maker
 
Things You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst IT
Things You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst ITThings You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst IT
Things You MUST Know Before Deploying OpenStack: Bruno Lago, Catalyst IT
 
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
 
reBuy on Kubernetes
reBuy on KubernetesreBuy on Kubernetes
reBuy on Kubernetes
 
Nuts and bolts of running a popular site in the aws cloud
Nuts and bolts of running a popular site in the aws cloudNuts and bolts of running a popular site in the aws cloud
Nuts and bolts of running a popular site in the aws cloud
 
RTP NPUG: Ansible Intro and Integration with ACI
RTP NPUG: Ansible Intro and Integration with ACIRTP NPUG: Ansible Intro and Integration with ACI
RTP NPUG: Ansible Intro and Integration with ACI
 
Boyan Krosnov - Building a software-defined cloud - our experience
Boyan Krosnov - Building a software-defined cloud - our experienceBoyan Krosnov - Building a software-defined cloud - our experience
Boyan Krosnov - Building a software-defined cloud - our experience
 
Deploying your web application with AWS ElasticBeanstalk
Deploying your web application with AWS ElasticBeanstalkDeploying your web application with AWS ElasticBeanstalk
Deploying your web application with AWS ElasticBeanstalk
 
HPC in the Cloud
HPC in the CloudHPC in the Cloud
HPC in the Cloud
 
AWS September Webinar Series - Visual Effects Rendering in the AWS Cloud with...
AWS September Webinar Series - Visual Effects Rendering in the AWS Cloud with...AWS September Webinar Series - Visual Effects Rendering in the AWS Cloud with...
AWS September Webinar Series - Visual Effects Rendering in the AWS Cloud with...
 
dockerSAW
dockerSAWdockerSAW
dockerSAW
 
Deep Dive on Delivering Amazon EC2 Instance Performance
Deep Dive on Delivering Amazon EC2 Instance PerformanceDeep Dive on Delivering Amazon EC2 Instance Performance
Deep Dive on Delivering Amazon EC2 Instance Performance
 
AWS Summit Benelux 2013 - Enterprise Applications on AWS
AWS Summit Benelux 2013 - Enterprise Applications on AWSAWS Summit Benelux 2013 - Enterprise Applications on AWS
AWS Summit Benelux 2013 - Enterprise Applications on AWS
 
AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09
 
Run tests at scale with on-demand Selenium Grid using AWS Fargate
Run tests at scale with on-demand Selenium Grid using AWS FargateRun tests at scale with on-demand Selenium Grid using AWS Fargate
Run tests at scale with on-demand Selenium Grid using AWS Fargate
 

More from Juan Vicente Herrera Ruiz de Alejo

More from Juan Vicente Herrera Ruiz de Alejo (20)

Keycloak SSO basics
Keycloak SSO basicsKeycloak SSO basics
Keycloak SSO basics
 
OpenShift Multicluster
OpenShift MulticlusterOpenShift Multicluster
OpenShift Multicluster
 
Deploying Minecraft with Ansible
Deploying Minecraft with AnsibleDeploying Minecraft with Ansible
Deploying Minecraft with Ansible
 
Tell me how you provision and I'll tell you how you are
Tell me how you provision and I'll tell you how you areTell me how you provision and I'll tell you how you are
Tell me how you provision and I'll tell you how you are
 
Santander DevopsandCloudDays 2021 - Hardening containers.pdf
Santander DevopsandCloudDays 2021 - Hardening containers.pdfSantander DevopsandCloudDays 2021 - Hardening containers.pdf
Santander DevopsandCloudDays 2021 - Hardening containers.pdf
 
X by orange; una telco en la nube
X by orange;   una telco en la nubeX by orange;   una telco en la nube
X by orange; una telco en la nube
 
Dorsal carrera de la mujer ROSAE 2017
Dorsal carrera de la mujer ROSAE 2017 Dorsal carrera de la mujer ROSAE 2017
Dorsal carrera de la mujer ROSAE 2017
 
Cartel carrera de la mujer ROSAE 2017
Cartel carrera de la mujer  ROSAE 2017Cartel carrera de la mujer  ROSAE 2017
Cartel carrera de la mujer ROSAE 2017
 
Volkswagen Prague Marathon 2017
Volkswagen Prague Marathon 2017Volkswagen Prague Marathon 2017
Volkswagen Prague Marathon 2017
 
Plan de entrenamiento Maratón de Madrid Mes 3
Plan de entrenamiento Maratón de Madrid Mes 3Plan de entrenamiento Maratón de Madrid Mes 3
Plan de entrenamiento Maratón de Madrid Mes 3
 
Plan de entrenamiento Maratón de Madrid Mes 2
Plan de entrenamiento Maratón de Madrid Mes 2Plan de entrenamiento Maratón de Madrid Mes 2
Plan de entrenamiento Maratón de Madrid Mes 2
 
Plan de entrenamiento Maratón de Madrid Mes 1
Plan de entrenamiento Maratón de Madrid Mes 1Plan de entrenamiento Maratón de Madrid Mes 1
Plan de entrenamiento Maratón de Madrid Mes 1
 
Cartel carrera de la mujer ROSAE 2014
Cartel carrera de la mujer ROSAE 2014Cartel carrera de la mujer ROSAE 2014
Cartel carrera de la mujer ROSAE 2014
 
Devops madrid: successful case in AWS
Devops madrid: successful case in AWSDevops madrid: successful case in AWS
Devops madrid: successful case in AWS
 
Devops Madrid Marzo - Caso de uso en AWS
Devops Madrid Marzo - Caso de uso en AWSDevops Madrid Marzo - Caso de uso en AWS
Devops Madrid Marzo - Caso de uso en AWS
 
Configuration management with Chef
Configuration management with ChefConfiguration management with Chef
Configuration management with Chef
 
DevOps and Chef improve your life
DevOps and Chef improve your life DevOps and Chef improve your life
DevOps and Chef improve your life
 
MongoDB Devops Madrid February 2012
MongoDB Devops Madrid February 2012MongoDB Devops Madrid February 2012
MongoDB Devops Madrid February 2012
 
Amazon EC2: What is this and what can I do with it?
Amazon EC2: What is this and what can I do with it?Amazon EC2: What is this and what can I do with it?
Amazon EC2: What is this and what can I do with it?
 
MongoDB - Madrid Devops Febrero
MongoDB - Madrid Devops FebreroMongoDB - Madrid Devops Febrero
MongoDB - Madrid Devops Febrero
 

Recently uploaded

TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
ChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps Productivity
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Choreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software EngineeringChoreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software Engineering
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Decarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational PerformanceDecarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational Performance
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and Insight
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 

AWS migration: getting to Data Center heaven with AWS and Chef

  • 1. Juan Vicente Herrera Ruiz de Alejo @jvicenteherrera
  • 3. Description ● Social feed aggregation/recommendation app ● Client developed by a Global Fortune 500 company that makes video consoles, TVs and many years ago Walkmans… ● Expected at the end of 2013 around 1.000.000 new users registered in the platform and 170.000 DAU ● All servers are running in AWS and the deployments and configuration management are handled by Chef.
  • 4. System stats Main Components – Custom API(Java) – Beanstalk – RabbitMQ – Redis – MongoDB (Sharding) EC2 – Production env: Reserved instances for the mininum configuration. On demand instances for scale out. – Staging env: Reserved instances for ½ day – Elastic Load Balancers – Security Groups and ACLs – Key Pairs per each subnet – Current EC2 region is US east
  • 6. VPC Subnet VPC Subnet VPC Subnet VPC Subnet VPC Subnet VPC Subnet DEV Stage APP Stage DB Prod APP Stage DB DNS VPN DEV- NAT Public- Nexus Public Git server Public- Chef Public- Jenkins Stage NAT Prod NAT Prod NAT Nagios forwarder ELB 1 Web Servers Stage ELB 1 Web Servers Prod Security Group Security Group Security Group Security Group Security Group Architecture/Infrastructure
  • 7. APP and DB VPC VPC Subnet VPC Subnet Prod APP Prod DB Security Group Security Group Mongodb Config1 Mongodb Config2 Mongodb Config2 Mongodb1 set1 Mongodb2 set1 Mongoarb set1 Mongodb1 set2 Mongodb2 set2 Mongoarb set2 MySQL master MySQL slave LDAP master LDAP slave ELB1 App1 APP1 servers App2 master App2 slave Solr master Solr slave Varnish master Varnish slave Alfresco APP2 servers APP3 servers APP3 servers Redis Master Redis slave Logs RabbitMQ servers
  • 8. Improvements achieved (I) ● APIs are state-less so you can scale out very easily. Nodes are created by Chef(Knife). ● Fine integration with Chef. Ensure that you have the same configuration in all of the environments and avoid misconfigurations in production environment. Chef Bootstrap ec2 instances works fully integrated with knife. ● Get a quick and confident way to create an exact production mirror (staging) environment with Chef and Cloudformation – Before AWS/Chef → create a staging env took 6 weeks – After AWS/Chef → create a staging env takes less than 1 day
  • 9. ● Save costs managing non-production environments – Before AWS/Chef → environments up 24*7 – After AWS/Chef → environments up 8 hours / working days (scripts in cron which use API Tools) – Python Script example ● Outage recovery plan handled with nodes snapshots (MongoDB) or Chef (other nodes stateless) ● Very quick response and customized consulting for the project provided by Amazon Team. Improvements achieved (II)
  • 10. Staging example with dynamic ip (dhcp) knife ec2 server create -I ami-af71f8c6 -r "role[apache]" -f m1.medium --region us-east-1 -S scp-staging -i /Users/juanvi/keypairs/scp-staging.pem -g sg-2418e54b -s subnet-919cecfc -x ec2-user -N stapp-apache-Test -E staging Staging example with static ip ec2-run-instances ami-af71f8c6 -k vpc-public-10-234-1 -g sg- 379e6d58 -s subnet-cb9596a0 -t m1.xlarge --private-ip- address 10.234.2.204 knife bootstrap 10.234.2.204 -/Users/juanvi/keypairs/scp- staging.pem -r "role[webserver]" -N STAGING-public- webserver2 -x ec2-user -E staging --sudo Example Create a new node
  • 11. What we have learned ● Strongly recommended run servers in more than one availability zone for avoid a total downtime in case of outage us-east-1a us-east-1d
  • 12. ●For certain services balanced use TCP instead of HTTP. The balancing of requests to different nodes of our APIs by TCP internally solved some problems with HTTP requests without closing sessions. We only use HTTP balancing for requests that come to the public Apache. We noticed that a lot of Apache connections were not closed properly with HTTP balance mode and in some hours we reached the limit connections Solved with TCP balance mode in ELB What we have learned (II)
  • 13. ●Use Cloudformation to create network resources automatically. –Before Cloudformation→ create one by one all of the resources –After Cloudformation →create automatically all the nodes and network resources of an entire environment in one execution –Cloudformation Example What we have learned (III)
  • 14. ●Analyze performance tests for choose the minimum number of nodes that will be running 24 * 7 and sizes to reserve instances. Reserved instances reduce the cost to 2/3. –Before AWS/Chef→ limits in the performance tests caused by non available servers due to their costs. Test simulated. –After AWS/Chef →High-powerful Instances available per use only for some hours or days with a reduced cost What we have learned (IV)
  • 15. ●Advisable to use a large number of small servers instances close to 100% CPU usage, instead of having few powerful machines with their resources wasted, and launch new nodes and balancing requests among them when load increase. ●Pre balancers warming if you expect a exponential increase of the requests ●Request to support increasing the initial limitations of instances that can run on a simultaneous EC2 (20) What we have learned (V)
  • 16. • You must adapt to the size of the instances whose resources(CPU, RAM...) are predefined and not customizable • You have no control over the evolution of the products that your service depends • You don't have access to the logs of some instances (for example load balancers) • Danger engaging AWS services and consequent difficulty migrating to another DC. Things to consider