Scalable web apps
execution time
vs
development time

Piotr Pelczar
me@athlan.pl
Types of scaling
Vertical scaling

Horizontal scaling

scale up

scale out
Think about your app as a worker
not single instance
OS

Load balancer

App

Server #1

App #1

App #2

Server #2
App #3

...
Think about your app as a worker
not single instance
Load balancer

Server #1
App #1

Server #3
Load balancer

App #2

Ser...
Sessions
We need:
• Common
• Fast
• Persistent

Storage for sessions.
Sessions

OS

Load balancer

App

Server #1

App #1

App #2

Server #2
App #3

Session storage

App #4

Server #3

App #5
Sessions - Redis

•
•
•
•
•

Key-value in memory database (hash-tabled)
Scalable up to 1k nodes
Partitioning with Query ro...
Redis - Partitioning with Query routing
Query
random
node

Miss

Node #1

Hit, abort

Node #2

Node #3

Also supported:
• ...
Centralized Logging
• Logs should be centrailzed to avoid taking
notice to each node separately
• Approaches:
– File repli...
Common storage, no local changes!
• Keep storage avaliable to all nodes
– Symfony2 Gaufrette Bundle
•
•
•
•
•

FTP
Amazon ...
Architecture
OS

Load balancer

App

Server #1
App #1

App #2

OS

Session storage

Server #2
App #3

App #4

Server #3
Ap...
Continuous Integration
• To keep all nodes up-to-date, you need CI
• Automatize disabling nodes, building,
deploying
– Jen...
Contineous Integration
1. Disable service on node
2. Deploy/build app
1. Copy files
2. Update db schema (liquibase, ORM sc...
Balance the payload - HAProxy
Yeah guys, this is logo :)
But no schema is needed
just imagine how it works.

• Very, very ...
Balance the payload - HAProxy
• You can check node’s status by pinging
• Dead node is excluded from balancing strategy
vi ...
Balance the payload - HAProxy
• Monitor node’s status by read stats from
socket via socat.

echo "show stat" | socat
/tmp/...
Balance the payload - HAProxy
• Monitor node’s status by native stats webapp
console
Nodes Monitoring - Zabbix
• Zabbix, centralized server monitoring
Zabbix + HAProxy
• UserParameter=haproxy.qcur[*],
echo "show stat" | socat
/tmp/haproxy.sock stdio | grep -i
'$1' | sed 's...
Reverse Proxy and Varnish cache
• Global virtual user = global cache

http://tomayko.com/writings/things-caches-do
Reverse Proxy – Expiration model

http://tomayko.com/writings/things-caches-do
Reverse Proxy – Expiration model

http://tomayko.com/writings/things-caches-do
Reverse Proxy – Validation model

http://tomayko.com/writings/things-caches-do
Reverse Proxy – Validation model

http://tomayko.com/writings/things-caches-do
Reverse Proxy and Varnish cache

Apache
:81

Varnish
:80

App
Reverse Proxy and Varnish cache
Apache
:8081
Varnish
:8080

App

HAProxy
:80
Apache
:8083
Varnish
:8082

App
Reverse Proxy and Varnish cache
Apache
:8081

App

Varnish
:80

HAProxy
:81
Apache
:8082

App
Varnish and ESI
<!DOCTYPE html>
<html>
<body>
<!-- ... some content -->
<!-- Embed the content of another page here -->
<e...
Scaling databases - Master slave
Write
Master

Slave

Read

• All data redundancy

Slave

Slave
MongoDB scaling
• Common models to spread data over nodes:
– range keys
– hash keys

• Many nodes on cheap machines
• No a...
MongoDB – range-based keys

http://docs.mongodb.org

• Awesome for range queries (grab data from min nodes –
Query isolati...
MongoDB – hash-based keys

http://docs.mongodb.org

• Take notice: not good for range queries while
merge-sorting, no Quer...
Mongodb sharding and clustering

http://docs.mongodb.org
CQRS
• Command Query Responsibility Segregation
– separate application service layers for writing and
readng from DB (poss...
CQRS
• Examples
– post-insert population cache
• all SELECTs are from cache (even invalid)
• consider LFU instead of LRU t...
Queues, RabbitMQ
• RabbitMQ is based on AMQP (Advanced
Message Queuing Protocol)
– point-to-point
– publish-and-subscribe
...
Queues, RabbitMQ
• Simple queue
• Work queues
(one consumer)

• Publish/Subscribe
(many consumers)
Box vs spread architecture.
• Box architecture
– no scaling
– easy to maintenance
Server

Webapp

Redis

RabbitMQ

Varnish...
Box vs spread architecture.
• Spread architecture
– High availability
– more integrations, more administrative
Server #1
R...
Scalable web apps
execution time
vs
development time

Piotr Pelczar
me@athlan.pl
Upcoming SlideShare
Loading in …5
×

Scalable Web Apps

6,182 views

Published on

Scalable web apps - execution time vs development time.

Published in: Technology

Scalable Web Apps

  1. 1. Scalable web apps execution time vs development time Piotr Pelczar me@athlan.pl
  2. 2. Types of scaling Vertical scaling Horizontal scaling scale up scale out
  3. 3. Think about your app as a worker not single instance OS Load balancer App Server #1 App #1 App #2 Server #2 App #3 App #4 Server #3 App #5
  4. 4. Think about your app as a worker not single instance Load balancer Server #1 App #1 Server #3 Load balancer App #2 Server #2 App #3 App #4 App #5 Server #n
  5. 5. Sessions We need: • Common • Fast • Persistent Storage for sessions.
  6. 6. Sessions OS Load balancer App Server #1 App #1 App #2 Server #2 App #3 Session storage App #4 Server #3 App #5
  7. 7. Sessions - Redis • • • • • Key-value in memory database (hash-tabled) Scalable up to 1k nodes Partitioning with Query routing Non blocking M-S replication on nodes Clustered (currently not production ready) http://athlan.pl/symfony2-redis-session-handler/
  8. 8. Redis - Partitioning with Query routing Query random node Miss Node #1 Hit, abort Node #2 Node #3 Also supported: • Client-side partitioning (app calls appropriate node) • Proxy assisted partitioning (proxy selects appropriate node)
  9. 9. Centralized Logging • Logs should be centrailzed to avoid taking notice to each node separately • Approaches: – File replication (rsync + cron) – syslog (easy to integrate with log4j) • syslogd over UDP p:514 • rsyslog over TCP, stores data in db
  10. 10. Common storage, no local changes! • Keep storage avaliable to all nodes – Symfony2 Gaufrette Bundle • • • • • FTP Amazon S3 OpenCloud AzureBlobStorage Rackspace
  11. 11. Architecture OS Load balancer App Server #1 App #1 App #2 OS Session storage Server #2 App #3 App #4 Server #3 App #5 Files storage abstraction Centralized logging
  12. 12. Continuous Integration • To keep all nodes up-to-date, you need CI • Automatize disabling nodes, building, deploying – Jenkins CI
  13. 13. Contineous Integration 1. Disable service on node 2. Deploy/build app 1. Copy files 2. Update db schema (liquibase, ORM schema update) 3. Execute scripts 3. Re-run service
  14. 14. Balance the payload - HAProxy Yeah guys, this is logo :) But no schema is needed just imagine how it works. • Very, very fast proxy! • Software TCP/HTTP load balancer • Different node selecting algorithms: – roudrobin (limit 4128) – static-rr – leastconn (lowest number of connections)
  15. 15. Balance the payload - HAProxy • You can check node’s status by pinging • Dead node is excluded from balancing strategy vi /etc/haproxy/haproxy.cfg option httpchk HEAD /check.txt HTTP/1.0 server webA 192.168.0.102:80 check server webB 192.168.0.103:80 check
  16. 16. Balance the payload - HAProxy • Monitor node’s status by read stats from socket via socat. echo "show stat" | socat /tmp/haproxy.sock stdio
  17. 17. Balance the payload - HAProxy • Monitor node’s status by native stats webapp console
  18. 18. Nodes Monitoring - Zabbix • Zabbix, centralized server monitoring
  19. 19. Zabbix + HAProxy • UserParameter=haproxy.qcur[*], echo "show stat" | socat /tmp/haproxy.sock stdio | grep -i '$1' | sed 's/,/ /g' | awk '{print $$3}'
  20. 20. Reverse Proxy and Varnish cache • Global virtual user = global cache http://tomayko.com/writings/things-caches-do
  21. 21. Reverse Proxy – Expiration model http://tomayko.com/writings/things-caches-do
  22. 22. Reverse Proxy – Expiration model http://tomayko.com/writings/things-caches-do
  23. 23. Reverse Proxy – Validation model http://tomayko.com/writings/things-caches-do
  24. 24. Reverse Proxy – Validation model http://tomayko.com/writings/things-caches-do
  25. 25. Reverse Proxy and Varnish cache Apache :81 Varnish :80 App
  26. 26. Reverse Proxy and Varnish cache Apache :8081 Varnish :8080 App HAProxy :80 Apache :8083 Varnish :8082 App
  27. 27. Reverse Proxy and Varnish cache Apache :8081 App Varnish :80 HAProxy :81 Apache :8082 App
  28. 28. Varnish and ESI <!DOCTYPE html> <html> <body> <!-- ... some content --> <!-- Embed the content of another page here --> <esi:include src="http://..." /> <!-- ... more content --> </body> </html>
  29. 29. Scaling databases - Master slave Write Master Slave Read • All data redundancy Slave Slave
  30. 30. MongoDB scaling • Common models to spread data over nodes: – range keys – hash keys • Many nodes on cheap machines • No all data redundancy in each node
  31. 31. MongoDB – range-based keys http://docs.mongodb.org • Awesome for range queries (grab data from min nodes – Query isolation) • Not good enough to distribute data over nodes in case of monotinic incemental
  32. 32. MongoDB – hash-based keys http://docs.mongodb.org • Take notice: not good for range queries while merge-sorting, no Query isolation in this case • Write scaling – Write to many nodes simultaneously (take notice to readers-writer lock, where write is exclusive)
  33. 33. Mongodb sharding and clustering http://docs.mongodb.org
  34. 34. CQRS • Command Query Responsibility Segregation – separate application service layers for writing and readng from DB (possibility to use different data sources like RAM or DB)
  35. 35. CQRS • Examples – post-insert population cache • all SELECTs are from cache (even invalid) • consider LFU instead of LRU to invaidate cache – pre-insert into memory • dump results periodicaly In both approaches there is convenient to use Queues or data bus !
  36. 36. Queues, RabbitMQ • RabbitMQ is based on AMQP (Advanced Message Queuing Protocol) – point-to-point – publish-and-subscribe – queueing, routing • AMQP is not JMS (Java Message Service is an API, not protocol) • Happy Rabit is empty Rabbit – do not try to store any data (messages) in queue system in persistent mode to keep HA
  37. 37. Queues, RabbitMQ • Simple queue • Work queues (one consumer) • Publish/Subscribe (many consumers)
  38. 38. Box vs spread architecture. • Box architecture – no scaling – easy to maintenance Server Webapp Redis RabbitMQ Varnish DB
  39. 39. Box vs spread architecture. • Spread architecture – High availability – more integrations, more administrative Server #1 RabbitMQ Redis HAProxy Server #2 Server #3 Webapp Webapp DB shard Varnish DB shard Varnish
  40. 40. Scalable web apps execution time vs development time Piotr Pelczar me@athlan.pl

×