Scalable Web Apps

Scalable web apps
execution time
vs
development time

Piotr Pelczar
me@athlan.pl

Types of scaling
Vertical scaling

Horizontal scaling

scale up

scale out

Think about your app as a worker
not single instance
OS

Load balancer

App

Server #1

App #1

App #2

Server #2
App #3

App #4

Server #3

App #5

Think about your app as a worker
not single instance
Load balancer

Server #1
App #1

Server #3
Load balancer

App #2

Server #2
App #3

App #4

App #5

Server #n

Sessions
We need:
• Common
• Fast
• Persistent

Storage for sessions.

Sessions

OS

Load balancer

App

Server #1

App #1

App #2

Server #2
App #3

Session storage

App #4

Server #3

App #5

Sessions - Redis

•
•
•
•
•

Key-value in memory database (hash-tabled)
Scalable up to 1k nodes
Partitioning with Query routing
Non blocking M-S replication on nodes
Clustered (currently not production ready)

http://athlan.pl/symfony2-redis-session-handler/

Redis - Partitioning with Query routing
Query
random
node

Miss

Node #1

Hit, abort

Node #2

Node #3

Also supported:
• Client-side partitioning (app calls appropriate
node)
• Proxy assisted partitioning (proxy selects
appropriate node)

Centralized Logging
• Logs should be centrailzed to avoid taking
notice to each node separately
• Approaches:
– File replication (rsync + cron)
– syslog (easy to integrate with log4j)
• syslogd over UDP p:514
• rsyslog over TCP, stores data in db

Common storage, no local changes!
• Keep storage avaliable to all nodes
– Symfony2 Gaufrette Bundle
•
•
•
•
•

FTP
Amazon S3
OpenCloud
AzureBlobStorage
Rackspace

Architecture
OS

Load balancer

App

Server #1
App #1

App #2

OS

Session storage

Server #2
App #3

App #4

Server #3
App #5

Files storage abstraction

Centralized logging

Continuous Integration
• To keep all nodes up-to-date, you need CI
• Automatize disabling nodes, building,
deploying
– Jenkins CI

Contineous Integration
1. Disable service on node
2. Deploy/build app
1. Copy files
2. Update db schema (liquibase, ORM schema
update)
3. Execute scripts

3. Re-run service

Balance the payload - HAProxy
Yeah guys, this is logo :)
But no schema is needed
just imagine how it works.

• Very, very fast proxy!
• Software TCP/HTTP load balancer
• Different node selecting algorithms:
– roudrobin (limit 4128)
– static-rr
– leastconn (lowest number of connections)

• You can check node’s status by pinging
• Dead node is excluded from balancing strategy
vi /etc/haproxy/haproxy.cfg
option httpchk HEAD /check.txt HTTP/1.0
server webA 192.168.0.102:80 check
server webB 192.168.0.103:80 check

• Monitor node’s status by read stats from
socket via socat.

echo "show stat" | socat
/tmp/haproxy.sock stdio

• Monitor node’s status by native stats webapp
console

Nodes Monitoring - Zabbix
• Zabbix, centralized server monitoring

Zabbix + HAProxy
• UserParameter=haproxy.qcur[*],
echo "show stat" | socat
/tmp/haproxy.sock stdio | grep -i
'$1' | sed 's/,/ /g' | awk
'{print $$3}'

Reverse Proxy and Varnish cache
• Global virtual user = global cache

http://tomayko.com/writings/things-caches-do

Reverse Proxy – Expiration model


Reverse Proxy – Validation model



Apache
:81

Varnish
:80

App

Apache
:8081
Varnish
:8080

App

HAProxy
:80
Apache
:8083
Varnish
:8082

App

Apache
:8081

App

Varnish
:80

HAProxy
:81
Apache
:8082

App

Varnish and ESI
<!DOCTYPE html>
<html>
<body>


<esi:include src="http://..." />

</body>
</html>

Scaling databases - Master slave
Write
Master

Slave

Read

• All data redundancy

Slave

Slave

MongoDB scaling
• Common models to spread data over nodes:
– range keys
– hash keys

• Many nodes on cheap machines
• No all data redundancy in each node

MongoDB – range-based keys

http://docs.mongodb.org

• Awesome for range queries (grab data from min nodes –
Query isolation)
• Not good enough to distribute data over nodes in case of
monotinic incemental

MongoDB – hash-based keys


• Take notice: not good for range queries while
merge-sorting, no Query isolation in this case
• Write scaling – Write to many nodes simultaneously (take
notice to readers-writer lock, where write is exclusive)

Mongodb sharding and clustering


CQRS
• Command Query Responsibility Segregation
– separate application service layers for writing and
readng from DB (possibility to use different data
sources like RAM or DB)

CQRS
• Examples
– post-insert population cache
• all SELECTs are from cache (even invalid)
• consider LFU instead of LRU to invaidate cache

– pre-insert into memory
• dump results periodicaly

In both approaches there is convenient to use
Queues or data bus !

Queues, RabbitMQ
• RabbitMQ is based on AMQP (Advanced
Message Queuing Protocol)
– point-to-point
– publish-and-subscribe
– queueing, routing

• AMQP is not JMS (Java Message Service is an
API, not protocol)
• Happy Rabit is empty Rabbit
– do not try to store any data (messages) in queue
system in persistent mode to keep HA

Queues, RabbitMQ
• Simple queue
• Work queues
(one consumer)

• Publish/Subscribe
(many consumers)

Box vs spread architecture.
• Box architecture
– no scaling
– easy to maintenance
Server

Webapp

Redis

RabbitMQ

Varnish

DB

Box vs spread architecture.
• Spread architecture
– High availability
– more integrations, more administrative
Server #1
RabbitMQ

Redis

HAProxy

Server #2

Server #3

Webapp

Webapp

DB shard

Varnish

DB shard

Varnish

Scalable Web Apps

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Scalable Web Apps

Similar to Scalable Web Apps (20)

More from Piotr Pelczar

More from Piotr Pelczar (6)

Recently uploaded

Recently uploaded (20)

Scalable Web Apps