Defining scalability
Scalability is the ability to handle increased workload
by repeatedly applying a costeffective strategy for
extending a system’s capacity.
(CMU paper, 2006)
How well a solution to some problem will work when
the size of the problem increases. When the size
decreases, the solution must fit. (dictionary.com and
Theo Schlossnagle, 2006)
Disposability Maximize robustness with
fast startup and graceful
shutdown
Disposable processes
Graceful shutdown on
SIGTERM
Handling sudden death:
robust queue backend
Startup and
Shutdown
Automate all the things
Chef
Docker
Gold image based
deployment
Immutable
Handling tasks before
shutdown
Backing Services Treat backing services as
attached resources
No distinction between
local and third party
services
Easily swap out resources
Export services via port
binding
Become the backing
service for another app
Processes,
concurrency
Stateless processes (not
even sticky sessions)
Process types by work type
We <3 linux process
Shared-nothing adding
concurrency is safe
Process distribution
spanning machines
Statelessness Store everything in a
datastore
Aggregate data
Chandra
Aggregator / map &
reduce
Scalable datastores
Handling user sessions
Monitoring Application state and
metrics
Dashboards
Alerting
Health
Remove failing nodes
Capacity
Act on trends
Monitoring Metrics collecting
Graphite, New Relic
Self-aware checks
Cluster state
Zookeeper, Consul
Scaling decision types
Capacity amount
Graph derivative
App requests
Load Balance and
Resource
Allocation
Load Balance: distribute
tasks
Utilize machines
efficiently
VM compatible apps
Flexibility
Adapting to available
resources
Load Balance DNS or API
App level balance
Uniform entry point or
proxy
Balance decisions
Load
Zookeeper state
Resource policies
Service
Separation
Failure is inevitable
Protect from failing
components
Cascading failure
Fail fast
Decoupling
Asynchronous operations
Message queues
Extras Debugging features
Logs
Clojure / JS consoles
Runtime configuration
via env
Scaling API
Integrating several
cloud providers
Automatic start / stop
Reading
Scalable Internet Architectures by Theo Schlossnagle
The 12-factor App: http://12factor.net/
Carnegie Mellon Paper: http://www.sei.cmu.edu/reports/06tn012.pdf
Circuit Breaker: http://martinfowler.com/bliki/CircuitBreaker.html
Release It! by Michael T. Nygard
Definition
Requirements coming from 12-factor, and some added by us
Some more detail and tools on selected requirements
30 day viewer graph. Clear peaks -> need for scaling
Quick description of the streaming stack, roles of components, how they require scaling
- Transcontroller/transcoder scaling
- UMS scaling
Quick description of the streaming stack, roles of components, how they require scaling
- Transcontroller/transcoder scaling
- UMS scaling
Carnegie Mellon University paper by Charles B. Weinstock, John B. Goodenough: On System Scalability
LINFO: The Linux Information Project http://www.linfo.org/
Next: principles
Example: calling imagemagick or curl from code – they might be there or might not be
Bundle everything into the app instead
Disposable process: they can be started or stopped at a moment’s notice
For a web process, graceful shutdown is achieved by ceasing to listen on the service port (thereby refusing any new requests), allowing any current requests to finish, and then exiting. Implicit in this model is that HTTP requests are short (no more than a few seconds), or in the case of long polling, the client should seamlessly attempt to reconnect when the connection is lost.
For a worker process, graceful shutdown is achieved by returning the current job to the work queue.
Docker: build images from dockerfile, deploy from repository
Tasks before shutdown: moving jobs, log collection, sleep
A backing service is any service the app consumes over the network as part of its normal operation. Examples include datastores (such as MySQL or CouchDB), messaging/queueing systems (such as RabbitMQ or Beanstalkd), SMTP services for outbound email (such as Postfix), and caching systems (such as Memcached).
Put a resource locator in the config only – environment variables
Example: Easily swap out a local mysql to a remote service
The app does not rely on runtime injection of a webserver into the execution environment to create a web-facing service. The web app exports HTTP as a service by binding to a port, and listening to requests coming in on that port.
One app can become the backing service for another app, by providing the URL to the backing app as a resource handle in the config for the consuming app
Handle diverse workloads by assigning each type of work to a process type. For example, HTTP requests may be handled by a web process, and long-running background tasks handled by a worker process
An individual VM can only grow so large (vertical scale), so the application must also be able to span multiple processes running on multiple physical machines.
Aggregate everything within the app and write it out in bulk – careful about write frequency, must not lose too many data on a crash
Aggregator map-reduce
Redis: scales reads, write problematic
Cassandra: quick scaling questionable
Aerospike: scales reads and writes, working together with their eng team
User sessions: persistent connection, NIO+
Alerting -> openduty
Two important groups: Health vs capacity
Report everything to graphite, constantly check graph trends automatically
Apps are self-aware, they know their health
App instances report into Zookeeper and thus know about each other
Central logic can request resource based on capacity or graph, app can request based on self-check or zookeeper
Zookeeper, Consul: miért, mik az előnyei
load balancing distributes workloads across multiple computing resources
Flexibility: can increase or decrease its own size, example: Threadpools
Adapting to CPU, RAM, disk, network
App level: transcontroller selects transcoder
App level balance with proxy can be SPOF, careful
Resource policies: even distribution, keep large chunks free for possible large tasks (transcoder use case), group requests together on some attribute (pro, etc)
Failure inevitable because: large numbers, hw issues, independent network
Decoupling: serving one request should not wait on others
Hystrix by Netflix 2011/12
Circuit Breaker: Martin Fowler post from 2014
Service decoupling example: inserting layers between DB and UMS -> RGW. Then another layer between RGW and UMS -> Queue
Logs: logs as stream / stdout (factor #9), collect / transport / process
Scaling API: Other considerations: price, network line to the cloud provider, instance type (spot vs normal)
Openstack, Ganeti