SlideShare a Scribd company logo
AWS guerrilla orchestration
KEY FIGURES
• 100k qps peak
• 6B requests/day
• 8 TB/day
• More than 20 services
• 750 servers peak
• 2 Regions (eu-west-1 and us-east-1)
• 5 AZs
• 30 ASGs
• Over 20 API integrations (Google, Twitter, Facebook, AppNexus, eBay, …)
What do we do (engineers perspective)
EC2 API based auto-discovery
WHY
• Cross region replication (EU and US)
• Cost (leverage spots)
• HyperLogLog
• Writable slaves (for set operations)
• Centralized monitoring and logging
Homemade Redis autoscaling
Our Redis structure
HOW
• Master EU
• Slaves EU (2-8)
• Replication over VPN
• Master US
• Slaves US (4-21)
• 1.2M ops peek
WE CARE ABOUT THE COSTS
• Two ASGs per region (on demand + spot)
• Slaves only in ASG
• Spot scales more aggressively
• All ASGs in one region behind same ELB
• ELB with TCP load balancing
• Jenkins job to monitor the crash of spot market
Deployment strategy
WHY NOT
• 1-2ms penalty per request
• Long lasting connections
• New machines don’t do anything
• Cross AZ requests add more latency
• Doesn’t consider replication
Going through ELB
DISTRIBUTED REDIS CONNECTION BALANCING
• If anything fails fall back to ELB
• Get the AZ for current host using AWS meta-data service
• Get Redis instances from the ELB
• Use instances from the same AZ and fall back to other AZ
• Use only running, healthy and replicated instances
• Check current number of clients connected and ops on each selected Redis
• Pick a Redis based on biased distribution and connect to it
Sneak behind ELB
Client Redis connection lifecycle
ØMQ mesh pipeline
OUR PIPELINE
• Unidirectional data flow
• Multiple ASG service layers
• Machines come and go all the time
• CPU based scaling
• 6 billion messages per day
• 100k messages per second peek time
Event driven architecture
ØMQ
WHY
• Connect your code in any language, on any platform
• Carries messages across inproc, IPC, TCP, TIPC, multicast
• Smart patterns like request-reply, pub-sub, push-pull
and router-dealer
• High-speed asynchronous I/O engines, in a tiny library
• Build any architecture: centralized, distributed,
small or large
• Smart handling of establishing connections
and reconnecting
HOW
• Define your network topology
using subnets in VPC
• Using subnets in ASGs you ensure
that you know where service will
potentially reside
• HINT: Don’t make a mess, there are
enough subnets to spare
Placement to the rescue
CONNECT TO WHERE THE SERVER WILL BE
Mesh (or was it mess) architecture
EC2 API based auto-discovery
• No maintenance
• Handles health checks
• Matches your deployment perfectly
• Adapts to changes fast
• Has unknown/unscalable API limit :(
ØMQ mesh pipeline
• Quick and dirty to setup
• Small and fast
• Queue is local to machines
• Limited scale (we tested up to 762 servers)
Thank you for your attention.
@utvara

More Related Content

What's hot

Openstack portal-bestpractices-campbell mcneill
Openstack portal-bestpractices-campbell mcneillOpenstack portal-bestpractices-campbell mcneill
Openstack portal-bestpractices-campbell mcneill
Campbell McNeill
 

What's hot (20)

AWS Elastic Compute Services
AWS Elastic Compute ServicesAWS Elastic Compute Services
AWS Elastic Compute Services
 
Cloudsolutionday 2016: Docker & FAAS at getvero.com
Cloudsolutionday 2016: Docker & FAAS at getvero.comCloudsolutionday 2016: Docker & FAAS at getvero.com
Cloudsolutionday 2016: Docker & FAAS at getvero.com
 
Cloudsolutionday 2016: DevOps workflow with Docker on AWS
Cloudsolutionday 2016: DevOps workflow with Docker on AWSCloudsolutionday 2016: DevOps workflow with Docker on AWS
Cloudsolutionday 2016: DevOps workflow with Docker on AWS
 
Training Slides: Introduction To Tungsten Solutions
Training Slides: Introduction To Tungsten SolutionsTraining Slides: Introduction To Tungsten Solutions
Training Slides: Introduction To Tungsten Solutions
 
Escalabilidade com Lambda e Elastic Beanstalk – Parte I
Escalabilidade com Lambda e Elastic Beanstalk – Parte IEscalabilidade com Lambda e Elastic Beanstalk – Parte I
Escalabilidade com Lambda e Elastic Beanstalk – Parte I
 
Tis the Season to Scale
Tis the Season to ScaleTis the Season to Scale
Tis the Season to Scale
 
Getting started with Riak in the Cloud
Getting started with Riak in the CloudGetting started with Riak in the Cloud
Getting started with Riak in the Cloud
 
Serverless framework on kubernetes
Serverless framework on kubernetesServerless framework on kubernetes
Serverless framework on kubernetes
 
Using Serverless Architectures to build and provision modern infrastructures​
Using Serverless Architectures to build and provision modern infrastructures​Using Serverless Architectures to build and provision modern infrastructures​
Using Serverless Architectures to build and provision modern infrastructures​
 
DataConf.TW2018: Develop Kafka Streams Application on Your Laptop
DataConf.TW2018: Develop Kafka Streams Application on Your LaptopDataConf.TW2018: Develop Kafka Streams Application on Your Laptop
DataConf.TW2018: Develop Kafka Streams Application on Your Laptop
 
Beyond Heroku: Hosting Your Rails App Yourself
Beyond Heroku: Hosting Your Rails App YourselfBeyond Heroku: Hosting Your Rails App Yourself
Beyond Heroku: Hosting Your Rails App Yourself
 
Azure Cosmos DB Kafka Connectors | Abinav Rameesh, Microsoft
Azure Cosmos DB Kafka Connectors | Abinav Rameesh, MicrosoftAzure Cosmos DB Kafka Connectors | Abinav Rameesh, Microsoft
Azure Cosmos DB Kafka Connectors | Abinav Rameesh, Microsoft
 
Intro to.net core 20170111
Intro to.net core   20170111Intro to.net core   20170111
Intro to.net core 20170111
 
Kubernetes User Group: 維運 Kubernetes 的兩三事
Kubernetes User Group: 維運 Kubernetes 的兩三事Kubernetes User Group: 維運 Kubernetes 的兩三事
Kubernetes User Group: 維運 Kubernetes 的兩三事
 
The future of cloud programming
The future of cloud programmingThe future of cloud programming
The future of cloud programming
 
Openstack portal-bestpractices-campbell mcneill
Openstack portal-bestpractices-campbell mcneillOpenstack portal-bestpractices-campbell mcneill
Openstack portal-bestpractices-campbell mcneill
 
Meetup Melbourne August 2017 - Agile Integration with Apache Camel microservi...
Meetup Melbourne August 2017 - Agile Integration with Apache Camel microservi...Meetup Melbourne August 2017 - Agile Integration with Apache Camel microservi...
Meetup Melbourne August 2017 - Agile Integration with Apache Camel microservi...
 
Briefing: Containers
Briefing: ContainersBriefing: Containers
Briefing: Containers
 
Consolidating services with middleware - NDC London 2017
Consolidating services with middleware - NDC London 2017Consolidating services with middleware - NDC London 2017
Consolidating services with middleware - NDC London 2017
 
From AWS to GCP, TABLEAPP Architecture Story
From AWS to GCP, TABLEAPP Architecture StoryFrom AWS to GCP, TABLEAPP Architecture Story
From AWS to GCP, TABLEAPP Architecture Story
 

Similar to AWS guerrilla orchestration

Similar to AWS guerrilla orchestration (20)

Deploying microservices on AWS
Deploying microservices on AWSDeploying microservices on AWS
Deploying microservices on AWS
 
Neutron scaling
Neutron scalingNeutron scaling
Neutron scaling
 
Elastic Kubernetes Services (EKS)
Elastic Kubernetes Services (EKS)Elastic Kubernetes Services (EKS)
Elastic Kubernetes Services (EKS)
 
Kube ovn-sandbox-proposal
Kube ovn-sandbox-proposalKube ovn-sandbox-proposal
Kube ovn-sandbox-proposal
 
Metrics driven development with dedicated Observability Team
Metrics driven development with dedicated Observability TeamMetrics driven development with dedicated Observability Team
Metrics driven development with dedicated Observability Team
 
AWS Lambda at JUST EAT
AWS Lambda at JUST EATAWS Lambda at JUST EAT
AWS Lambda at JUST EAT
 
Intro to AWS: EC2 & Compute Services
Intro to AWS: EC2 & Compute ServicesIntro to AWS: EC2 & Compute Services
Intro to AWS: EC2 & Compute Services
 
Building scalable flexible messaging systems using qpid
Building scalable flexible messaging systems using qpidBuilding scalable flexible messaging systems using qpid
Building scalable flexible messaging systems using qpid
 
Tips Tricks and Tactics with Cells and Scaling OpenStack - May, 2015
Tips Tricks and Tactics with Cells and Scaling OpenStack - May, 2015Tips Tricks and Tactics with Cells and Scaling OpenStack - May, 2015
Tips Tricks and Tactics with Cells and Scaling OpenStack - May, 2015
 
Micro Services Architecture
Micro Services ArchitectureMicro Services Architecture
Micro Services Architecture
 
Summer 2017 undergraduate research powerpoint
Summer 2017 undergraduate research powerpointSummer 2017 undergraduate research powerpoint
Summer 2017 undergraduate research powerpoint
 
OpenStack and Windows
OpenStack and WindowsOpenStack and Windows
OpenStack and Windows
 
Intro to AWS: Amazon EC2 and Compute Services
Intro to AWS: Amazon EC2 and Compute ServicesIntro to AWS: Amazon EC2 and Compute Services
Intro to AWS: Amazon EC2 and Compute Services
 
AWS for the Java Developer
AWS for the Java DeveloperAWS for the Java Developer
AWS for the Java Developer
 
Intro to AWS: Amazon EC2 and Compute Services
Intro to AWS: Amazon EC2 and Compute ServicesIntro to AWS: Amazon EC2 and Compute Services
Intro to AWS: Amazon EC2 and Compute Services
 
Serverless applications
Serverless applicationsServerless applications
Serverless applications
 
OpenKilda: Stream Processing Meets Openflow
OpenKilda: Stream Processing Meets OpenflowOpenKilda: Stream Processing Meets Openflow
OpenKilda: Stream Processing Meets Openflow
 
How to Serve Blockchain Data with AWS Lambda
How to Serve Blockchain Data with AWS LambdaHow to Serve Blockchain Data with AWS Lambda
How to Serve Blockchain Data with AWS Lambda
 
Arquitetura Hibrida - Integrando seu Data Center com a Nuvem da AWS
Arquitetura Hibrida - Integrando seu Data Center com a Nuvem da AWSArquitetura Hibrida - Integrando seu Data Center com a Nuvem da AWS
Arquitetura Hibrida - Integrando seu Data Center com a Nuvem da AWS
 
MySQL in the Cloud
MySQL in the CloudMySQL in the Cloud
MySQL in the Cloud
 

Recently uploaded

Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns
 
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
Alluxio, Inc.
 

Recently uploaded (20)

Into the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdfInto the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdf
 
Designing for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web ServicesDesigning for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web Services
 
Studiovity film pre-production and screenwriting software
Studiovity film pre-production and screenwriting softwareStudiovity film pre-production and screenwriting software
Studiovity film pre-production and screenwriting software
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
GraphAware - Transforming policing with graph-based intelligence analysis
GraphAware - Transforming policing with graph-based intelligence analysisGraphAware - Transforming policing with graph-based intelligence analysis
GraphAware - Transforming policing with graph-based intelligence analysis
 
Accelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with PlatformlessAccelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with Platformless
 
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
 
A Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdfA Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdf
 
De mooiste recreatieve routes ontdekken met RouteYou en FME
De mooiste recreatieve routes ontdekken met RouteYou en FMEDe mooiste recreatieve routes ontdekken met RouteYou en FME
De mooiste recreatieve routes ontdekken met RouteYou en FME
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
 
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAGAI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
 
Cyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdfCyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdf
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
 
Using IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New ZealandUsing IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New Zealand
 
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
 

AWS guerrilla orchestration

  • 2. KEY FIGURES • 100k qps peak • 6B requests/day • 8 TB/day • More than 20 services • 750 servers peak • 2 Regions (eu-west-1 and us-east-1) • 5 AZs • 30 ASGs • Over 20 API integrations (Google, Twitter, Facebook, AppNexus, eBay, …) What do we do (engineers perspective)
  • 3. EC2 API based auto-discovery
  • 4. WHY • Cross region replication (EU and US) • Cost (leverage spots) • HyperLogLog • Writable slaves (for set operations) • Centralized monitoring and logging Homemade Redis autoscaling
  • 5. Our Redis structure HOW • Master EU • Slaves EU (2-8) • Replication over VPN • Master US • Slaves US (4-21) • 1.2M ops peek
  • 6. WE CARE ABOUT THE COSTS • Two ASGs per region (on demand + spot) • Slaves only in ASG • Spot scales more aggressively • All ASGs in one region behind same ELB • ELB with TCP load balancing • Jenkins job to monitor the crash of spot market Deployment strategy
  • 7. WHY NOT • 1-2ms penalty per request • Long lasting connections • New machines don’t do anything • Cross AZ requests add more latency • Doesn’t consider replication Going through ELB
  • 8. DISTRIBUTED REDIS CONNECTION BALANCING • If anything fails fall back to ELB • Get the AZ for current host using AWS meta-data service • Get Redis instances from the ELB • Use instances from the same AZ and fall back to other AZ • Use only running, healthy and replicated instances • Check current number of clients connected and ops on each selected Redis • Pick a Redis based on biased distribution and connect to it Sneak behind ELB
  • 11. OUR PIPELINE • Unidirectional data flow • Multiple ASG service layers • Machines come and go all the time • CPU based scaling • 6 billion messages per day • 100k messages per second peek time Event driven architecture
  • 12. ØMQ WHY • Connect your code in any language, on any platform • Carries messages across inproc, IPC, TCP, TIPC, multicast • Smart patterns like request-reply, pub-sub, push-pull and router-dealer • High-speed asynchronous I/O engines, in a tiny library • Build any architecture: centralized, distributed, small or large • Smart handling of establishing connections and reconnecting
  • 13. HOW • Define your network topology using subnets in VPC • Using subnets in ASGs you ensure that you know where service will potentially reside • HINT: Don’t make a mess, there are enough subnets to spare Placement to the rescue
  • 14. CONNECT TO WHERE THE SERVER WILL BE Mesh (or was it mess) architecture
  • 15. EC2 API based auto-discovery • No maintenance • Handles health checks • Matches your deployment perfectly • Adapts to changes fast • Has unknown/unscalable API limit :( ØMQ mesh pipeline • Quick and dirty to setup • Small and fast • Queue is local to machines • Limited scale (we tested up to 762 servers)
  • 16. Thank you for your attention. @utvara