Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Atmosphere 2016 - Pawel Mastalerz, Wojciech Inglot - New way of building infrastructure

180 views

Published on

Creating infrastructure for global web and mobile applications can be hard. Creating infrastructure for fast growing global applications can be very hard :) In brainly we had to move from traditional LAMP setup with bare metal servers to something new and cloud was not enough. With software like ansible, mesos, docker, consul, we have designed fully automated immutable setup, even with tests! On this presentation we will show you how, and share with you our exeperince with running this kind of platform.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Atmosphere 2016 - Pawel Mastalerz, Wojciech Inglot - New way of building infrastructure

  1. 1. NEW WAY OF BUILDING INFRASTRUCTURE Paweł Mastalerz DevOps Engineer @ Brainly pawel.mastalerz@brainly.com Wojciech Inglot
 DevOps Engineer @ Brainly wojciech.inglot@brainly.com
  2. 2. 2 1. About Brainly 2.Current infrastructure 3.SOA (µservices) 4.Idea of new infrastructure 5.How have we made the concept alive? 6.Summary Presentation Plan
  3. 3. 3 Brainly is the world’s largest social learning platform About Brainly 60M
 Monthly Unique Users 4K
 requests per second 360M
 Monthly Page Views
  4. 4. Current Infrastructure
  5. 5. 5 Current infrastructure 100+ dedicated servers 700M bps during peak traffic 800+ LXC containers LAMP stack Ansible
  6. 6. 6 Current infrastructure Pros: ●Low costs ●Full server power 24x7 ●Good community support for LAMP Cons: ●We need to take care of hardware failures ●Designed to run only LAMP stack ●Low network stability ●Too slow and time consuming scaling
  7. 7. 7 SOA (µservices)
  8. 8. 8 Why Service Oriented Architecture? ●Simple design focused on one business capability ●Can be developed independently by different teams ●Can be developed using different programming languages and tools ●Decentralized data management ●Can be easily scaled by adding more instances of a service
  9. 9. 9 Why µservices? ●Erlang style approach to failure ●Small codebase ●Each service is running in its own process
  10. 10. Idea of new infrastructure
  11. 11. 11 ●Defining the infrastructure as DSL ●Documentation ●Changes goes through review via PRs like in normal development cycle Infrastructure as a Code
  12. 12. 12 ●Using pre-build images ●No live provisioning ●Immutability = Confidence Immutability
  13. 13. ●Easy to add more servers when needed ●Shut them down when not needed ●Done automatically 13 Scalable
  14. 14. 14 ●Failure of few servers should not affect the system ●Failure of entire zone should not affect the system ●Fast and relatively easy to fix or replace failed servers Fault Tolerant
  15. 15. 15 Language agnostic ●PHP is not enough ●Allow to run applications written in any language ●Use the best language to solve specific problem
  16. 16. How have we made the concept alive?
  17. 17. 17 Core: Cloud instead of bare metal ●Third party is taking care of the hardware ●Isolated private network ●Elastic costs
  18. 18. 18 Core: Mesos ●Easy scaling ●Abstracts out the managing resources from processing application ●Handles cluster health and management ●Production proven at massive scale
  19. 19. 19 Core: Marathon ●Designed to run on Mesos ●Creates tasks for app ●Rolling deploy / restart ●Evaluate application's health using HTTP or TCP checks to ensure 100% uptime
  20. 20. 20 Core: Consul ●Service discovery ●Multi datacenter ●Health checks ●Nginx configuration templating with consul-template
  21. 21. 21 Core: Docker ●Language agnostic ●Natively supported by Mesos and Marathon ●Process isolation ●Automatic registration/deregistration in Consul with Registrator
  22. 22. 22 Core: Wired up
  23. 23. 23 Data bus: RabbitMQ ●Automatic clustering with Consul ●High availability ●Easy scaling ●Wariety of programs/libraries support AMQP protocol
  24. 24. 24 Data bus: RabbitMQ
  25. 25. 25 Continuous Delivery with Ansible
  26. 26. 26 ●Deploy declaration in .yaml file stored in repository ○ Resources for single instance ○ Minimum number of instances ○ Environment variables and secrets ○ Before/after deployment scripts to run ●Loaded and processed by Ansible during build ●Secrets are injected from Ansible Vault to microservice during deployment Continuous Delivery with Ansible
  27. 27. 27 Continuous Delivery with Ansible deploy:
 instances: 5
 public: true
 resources:
 cpus: 0.1
 memory: 64.0
 environment:
 RABBITMQ_HOSTNAME: $ {RABBITMQ_HOSTNAME}
 ALGOLIA_API_KEY: dev123
  28. 28. 28 ●ELK stack ●EC2 Discovery plugin - easy scaling of Elasticsearch Data nodes ●> 100GB logs a day Centralized logs
  29. 29. 29 ●Using tools developed by InfluxData: Telegraf and InlfuxDB ●Tracking resource usage for each microservice - cpu, memory, network, events in data bus ●Gathering stats for each server ●Visualisation on Grafana Monitoring
  30. 30. 30 Monitoring
  31. 31. 31 ●Each μservice should define contract under which quality of service will be delivered, measured and monitored ●The file must be stored in root directory of the project, under the name .sla.yml Alerting
  32. 32. 32 contract:
 http:
 max_response_time: 100 
 max_response_time_time_window: 360
 max_5xx: 2 
 max_5xx_time_window: 10
 log:
 panic:
 max: 1
 time_window: 360
 
 responsible:
 - slack: hash..... Alerting
  33. 33. Summary
  34. 34. 34 Platform in numbers ●28 (micro)services ●72 instances ●18 GB memory ●60% CPU utilization ●30s - deploy time for most services
  35. 35. 35 Benefits - Dev ●We have entire set of Dockerized services, written in Python, for machine learning ●OCR µservices written in Go and Python ●Common environment for DEV and PROD
  36. 36. 36 Benefits - Ops ●More time spent on actual problems rather than fixing broken RAID :) ●Easy and fast scaling ●Utilize the DevOps approach
  37. 37. Thanks! Questions?

×