PaaSTA: Autoscaling at Yelp

Nathan Handler
nhandler@yelp.com / @nathanhandler
PaaSTA
Autoscaling at Yelp

● Nathan Handler / @nathanhandler
● Site Reliability Engineer at Yelp
● PaaSTA Developer and Maintainer
● Ubuntu/Debian GNU/Linux Developer
● freenode IRC staff member
Who am I?
2

Yelp’s Mission
Connecting people with great
local businesses.
3

Yelp Stats
As of Q3 2016
97M 3274%115M

● Monolithic Python application (~3M LoC)
● Builds/deployments took a long time
○ Bottleneck on how often we can deploy
● Mistakes are painful
○ Large impact
○ Difficult to find
○ Slow to fix
History
5

● Split features into different applications
● Smaller services allowed for faster pushes
● Easier to reason about issues
● Able to scale services independently
Service Oriented Architecture v1
6

● Standalone application
● Stateless
● Separate git repository
● Typically at Yelp:
○ HTTP API
○ Python, Pyramid, uWSGI
○ virtualenv
What is a service?
7

● Statically defined list of hosts to deploy a service on
● Operations decides which hosts to deploy to
● Monitoring manually configured in Nagios
● Manual deployment system via rsync
Deploying Services v1
8

● Yelp's Platform as a Service
● Builds, Deploys, Connects, and Monitors services
● Glue around existing and established open source
tools
PaaSTA
11
https://github.com/yelp/paasta
#paasta on irc.freenode.net

PaaSTA Components
13
Docker
Registry
Developer
git push
git pull
git push
docker push
13
Marathon
Sensu

PaaSTA Components
14
Developer
git push
14

.
├── Dockerfile
├── htdocs
│ ├── index.php
│ └── status
└── Makefile
15
A simple service

DOCKER_TAG ?= $(USER)-dev
test:
@echo 'Unit testing'
itest: cook-image
paasta local-run --healthcheck --service devops
cook-image:
docker build -t $(DOCKER_TAG) .
16
$ cat Makefile

DOCKER_TAG ?= $(USER)-dev
test:
@echo 'Unit testing'
itest: cook-image
paasta local-run --healthcheck --service devops
cook-image:
docker build -t $(DOCKER_TAG) .
17
$ cat Makefile

Containers: like lightweight VMs
Provides a language (Dockerfile) for describing
container image
Reproducible builds (mostly)
Provides software flexibility
18
Docker
docker.com

FROM ubuntu:xenial
MAINTAINER Nathan Handler <nhandler@yelp.com>
RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get -y install
apache2
libapache2-mod-php
ENV APACHE_RUN_USER www-data
ENV APACHE_RUN_GROUP www-data
ENV APACHE_LOG_DIR /var/log/apache2
ENV APACHE_LOCK_DIR /var/lock/apache2
ENV APACHE_PID_FILE /var/run/apache2.pid
RUN rm -f /var/www/html/index.html
COPY htdocs /var/www/html/
CMD ["/usr/sbin/apache2", "-D", "FOREGROUND", "-C", "listen 8888"]
EXPOSE 8888 19
$ cat Dockerfile

✓ yelpsoa-config directory for devops found in /nail/etc/serv
✓ deploy.yaml exists for a Jenkins pipeline
✗ No 'security-check' entry was found in your deploy.yaml.
Please add a security-check entry *AFTER* the itest entry in d
so your docker image can be checked against known security vul
More info: http://servicedocs.yelpcorp.com/docs/paasta_tools/c
✗ No 'performance-check' entry was found in your deploy.yaml.
Please add a performance-check entry *AFTER* the security-chec
so your docker image can be checked for performance regression
More info: http://servicedocs.yelpcorp.com/docs/paasta_tools/c
✓ Jenkins build pipeline found
✓ Git repo found in the expected location.
✓ Found Dockerfile
✓ A Makefile is present
✓ The Makefile contains a tab character
✓ The Makefile contains a docker tag
✓ The Makefile responds to `make cook-image`
✓ The Makefile responds to `make itest`
✓ The Makefile responds to `make test`
✓ Found marathon.yaml file.
✓ All entries in deploy.yaml correspond to a marathon or chro
✓ All marathon instances have a corresponding deploy.yaml ent
✓ monitoring.yaml found for Sensu monitoring
✓ Your service uses Sensu and team 'nhandler' will get alerts
✓ Found smartstack.yaml file
✓ Instance 'demo' of your service is using smartstack port 20
balanced
✓ Successfully validated schema: marathon-nova-devc.yaml
20
$ paasta check

PaaSTA Components
21
Docker
Registry
Developer
git push
git pull
git push
docker push
21

---
pipeline:
- instancename: itest
- instancename: push-to-registry
- instancename: dev.everything
22
$ cat deploy.yaml

paasta-nova-devc.demo-20160503T231914-deploy
paasta-nova-devc.demo-20160503T234021-stop
paasta-nova-devc.demo-20160503T234202-start
24
$ git tag

Describe end goal, not path
Helps us achieve fault tolerance.
"Deploy 6de16ff2 to prod"
vs.
"Commit 6de16ff2 should be running in prod"
Gas pedal vs. Cruise Control
25
Declarative control

Description: A demo PaaSTA service for OSCON 2016
External Link:
http://conferences.oreilly.com/oscon/open-source-us/public/schedule/detail/49358
Monitored By: team nhandler
Runbook: Please set a `runbook` field in your monitoring.yaml. Like "y/rb-m
Docs: https://trac.yelpcorp.com/wiki/HowToService/Monitoring/monitoring.yam
Git Repo: git@git.yelpcorp.com:services/devops
Jenkins Pipeline: https://jenkins.yelpcorp.com/view/services-devops
Deployed to the following clusters:
- nova-devc (N/A)
Smartstack endpoint(s):
- http://169.254.255.254:20973 (demo)
Dashboard(s):
- https://uchiwa.yelpcorp.com/#/events?q=devops (Sensu Alerts)
26
$ paasta info

PaaSTA Components
27
Docker
Registry
Developer
git push
git pull
git push
docker push
27
Marathon

Mesos is an "SDK for distributed systems", not
batteries-included.
Requires a framework
• Marathon
• Chronos
Can run many frameworks on the same cluster
Supports Docker as task executor
28
mesosphere.io
mesos.apache.org
Scheduling: Mesos + Marathon

---
main:
cpus: 0.1
instances: 3
mem: 500
29
$ cat marathon-nova-devc.yaml

PaaSTA Components
30
Docker
Registry
Developer
git push
git pull
git push
docker push
30
Marathon

● Brutal: Stops old versions and starts the new
version, without regard to safety.
● Upthendown: Brings up the new version of the
service and waits until all instances are healthy
before stopping the old versions.
● Downthenup: Stops any old versions and waits for
them to die before starting the new version.
● Crossover: Starts the new version, and gradually kills
instances of the old versions as new instances
become healthy.
Bounce Strategies
31

32
Discovery in PaaSTA: Smartstack
mesos
slave
box2
client
nerve
HAProxy
synapse
box1
service
nerve
mesos
slave
synapse
HAProxy
ZooKeeperMetadata
HTTP request
healthcheck

33
Latency Zones
Superregion
Region
Habitat

---
demo:
advertise: [region]
discover: region
proxy_port: 20973
34
$ cat smartstack.yaml

● Reduce/Eliminate Manual Scaling
● Stop Overprovisioning
● Save Money on our Infrastructure Bill
● Increase Reliability
Why Autoscale?
35

Pipeline: https://jenkins.yelpcorp.com/view/services-devops
cluster: nova-devc
instance: demo
Git sha: 6de16ff2
State: Running - Desired state: Started
Marathon: Healthy - up with (3/3) instances. Status: Running.
Mesos: Healthy - (3/3) tasks in the TASK_RUNNING state.
Smartstack:
Name LastCheck LastChange Status
useast1-devc - Healthy - in haproxy with (3/3) total backends UP in this namespace.
36
$ paasta status

---
main:
cpus: 0.1
instances: 3
mem: 500
38

---
main:
cpus: 0.1
min_instances: 3
max_instances: 5
mem: 500
39

Autoscaling Should Be Flexible
40

● mesos_cpu: Tries to use CPU usage to predict when
to autoscale (Default)
● http: Makes a request on a HTTP endpoint on your
service. Expects a JSON-formatted dictionary with a
'utilization' field containing a number 0-1
● uwsgi: Uses the percentage of non-idle workers as
the utilization metric
Metrics Providers
41

● pid: Uses a PID controller to determine when to
autoscale a service
● threshold: Autoscales when a service’s utilization
exceeds beyond a certain threshold.
● bespoke: Allows a service author to implement their
own autoscaling.
Decision Policies
42

---
main:
cpus: 0.1
min_instances: 3
max_instances: 12
mem: 500
autoscaling:
metrics_provider: http
endpoint: metrics.json
decision_policy: threshold
setpoint: 0.5
43

● Examines all resources tracked by PaaSTA
○ Scales based on the "worst" value
● Supports scaling pools independently
● Supports a configurable target_utilization
● Supports both ASGs and SFRs in AWS
Cluster Autoscaler
45

Cluster: mesosstage
Dashboards:
Marathon RO: http://marathon.paasta-mesosstage.yelp/
Smartstack: http://paasta-mesosstage.yelp:3212
Chronos RO: http://chronos.paasta-mesosstage.yelp/
Mesos: http://mesos.paasta-mesosstage.yelp
Mesos Status: OK
quorum: masters: 3 configured quorum: 2
frameworks:
framework: chronos-2.4.0 count: 1
framework: marathon count: 1
CPUs: 1.00 / 7 in use (14.29%)
Memory: 3.03 / 42.85GB in use (7.07%)
Disk: 10.00 / 153.81GB in use (6.50%)
tasks: running: 9 staging: 1 starting: 0
slaves: active: 7 inactive: 0
Marathon Status: OK
marathon apps: 5
marathon tasks: 9
marathon deployments: 0
Chronos Status: OK
Enabled chronos jobs: 1 46
$ paasta metastatus

module "nova-devc-useast1a-demo" {
source = ".../terraform-modules/paasta_spot_cluster"
cluster = "nova-devc"
ecosystem = "${var.ecosystem}"
min_capacity = 2
max_capacity = 20
initial_target_capacity = 5
ami_type = "paasta"
spot_price = 0.1337
instance_profile = "paasta"
pool = "demo"
}
47
Terraform'ed PaaSTA Pool

● Scaling up is safe and easy
● Scaling down can be risky
Scaling Safely
48

● Utilizes Mesos Maintenance Primitives
● Attempts to dynamically reserve all available
resources on a host
● Scales up the service by the number of instances
running on the host that is in maintenance
● Terminates the host once it is fully drained or it
reaches its timeout
PaaSTA Maintenance
49

PaaSTA Components
50
Docker
Registry
Developer
git push
git pull
git push
docker push
50
Marathon
Sensu

---
team: nhandler
page: true
notification_email: nhandler+devops@yelp.com
51
$ cat monitoring.yaml

● More advanced decision policies
● Infrastructure-Agnostic
● Better integration with PaaSTA/Mesos
Why not AWS (or something else)?
53

@YelpEngineering
fb.com/YelpEngineers
engineeringblog.yelp.com
github.com/yelp
54

PaaSTA: Autoscaling at Yelp

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (15)

Similar to PaaSTA: Autoscaling at Yelp

Similar to PaaSTA: Autoscaling at Yelp (20)

Recently uploaded

Recently uploaded (20)

PaaSTA: Autoscaling at Yelp