Scaling your website

SCALING YOUR WEBSITE
Alejandro Marcu
Dutch PHP Conference 2016

2
 Started programming Logo at
8 years old
 Then moved to Basic, Turbo
Pascal, C++, Java
 2001 – 2004 Various
programming jobs in
Argentina
 2004 – 2008: TopCoder
 2009 – 2015: Facebook
Alejandro Marcu

3
 Scalable architecture
 Scaling the database
 Caching
 Introducing new features
What You Will Learn Today

5
Single Server
 Hosted or in the cloud
 Web App: Apache/Nginx +
PHP
 DB: MySql, MongoDB, etc.
 Cache: Memcache, Redis Web App
CacheDB
Server
User

6
 More RAM
 More cores or faster CPU
 SSD
 RAID
 Network Interfaces
Scaling Vertically

7
Functional Partitioning
 Servers can have different
hardware specs
 More latency
 Limited growth
Server 1
Server 3Server 2
Web App
CacheDB
Data Center
User

8
Splitting the Web App
 Web Front End should be a
thin presentation layer
 Services
 Just another class
 Remote over SOAP, REST,
Thrift
 Start simple, plan for scale
Web Front End
Service 1
DB
Service 2 Service n
Back End
Cache
iOS
App
Android
App

9
 Back end servers can have
one or more services
 Some services can be in
more than one server
Service 1 Service n
Back End
Server 4 Server k
Server 1
Server 3Server 2
Web Front End
CacheDB
Data Center
User

10
 Don’t store anything locally
 Use external storage (e.g. databases)
 Can use local caching
Stateless Services

11
 HTTP Session
 Cookies
 External Data Store
 Uploaded Files
 DFS: GFS, HDFS, ClusterFS
 Amazon S3
Stateless Front End

12
Multiple Front End Servers
Load Balancer:
 Cloud based (Amazon ELB)
 Software (NGINX, HAProxy)
 Hardware (BIG-IP,
Netscaler)
Load
Balancer
Service 1 Service n
Back End
CacheDB
Data Center
User
Web FE 1
Front End
Web FE k

13
Caching static files
 Files that are the same on
each request, e.g. jpg, png,
css, js, mp3, etc
 Reverse HTTP Proxy
 Load balancers usually
provide this functionality
 CDN (Content Delivery
Network)
 E.g. Akamai, Amazon
Cloudfront
 Pay for usage
 Multiple locations
User CDN
Data Center
static
content
dynamic
content

14
 Advantages
 Lower latency for users
 Reduced disaster risk
 Economic opportunities
 Challenges
 Consistency
 Latency between data centers
 Bandwidth between data centers
Multiple Data Centers

16
 Too much data
 Too many reads
 Too many writes
 Want higher availability
Scaling relational databases

17
Replication
 Usually much more reads
than writes
 Higher availability
 Read after write can be
wrong
Master
Slave Slave
R/W
R
DB clients
Binlogs

18
 Limited growth
 Can separate unrelated
functionality
User
Post
Payment
DB 1
DB 2

19
Sharding
 Tables are split into multiple
DBs
 Sharding key used to decide
which db, e.g. id
 Sharding function, e.g.
db(id) = (id % 2) + 1
 Searching becomes more
complicated
id name
1 John
3 Jack
5 Anne
id name
2 Louise
4 Bob
6 Marie
DB 1
DB 2

20
Sharding
 E.g., add an extra db
 New sharding function:
db(id) = (id % 3) + 1
 Conclusion: modulo is not a
good sharding function
id name
1 John
3 Jack
5 Anne
id name
2 Louise
4 Bob
6 Marie
DB 1
DB 2
id name
1 John
4 Bob
DB 1
id name
2 Louise
5 Anne
DB 2
id name
3 Jack
6 Marie
DB 3

21
Consistent Sharding
 Consistent sharding needs
less reallocations id name
1 John
3 Jack
5 Anne
id name
2 Louise
4 Bob
6 Marie
DB 1
DB 2
id name
1 John
3 Jack
DB 1
id name
2 Louise
4 Bob
DB 2
id name
5 Anne
6 Marie
DB 3

22
Sharding
 Create many logical DBs
 Distribute them across
servers
Server 1
DB 1
DB 2
…
…
DB 16
Server 2
DB 17
DB 18
…
…
DB 32

23
Sharding
 Re-distribute DBs when
needed
 Need a function to map db to
server, can be a
configuration
Server 1
DB 1
DB 2
…
…
DB 16
Server 2
DB 17
DB 18
…
…
DB 24
Server 3
DB 25
DB 18
…
…
DB 32

24
Sharding colocation
 Put owned data in the same
table (e.g. shard by user_id
in post table)
 Can execute joins
user
id name
1 John
3 Jack
5 Anne
id name
2 Louise
4 Bob
6 Marie
DB 1
DB 2
user
post
id user_id text
100 1 …
125 1 …
180 3 …
post
id user_id text
143 2 …
110 6 …
175 6 …

25
Sharding fan-out
 Many-to-many relationships
are spread out
 To get friend’s names:
 Get ids
 Group by db
 Query on each db
 Gets worse with more dbs
 Caching helps a lot
 Needs inverse entries
user
id name
1 John
3 Jack
5 Anne
id name
2 Louise
4 Bob
6 Marie
DB 1
DB 2
user
friend
id1 id2
1 2
1 4
3 4
friend
id1 id2
2 1
4 1
4 3

26
 Replication
 Scales reads, higher availability
 Functional partitioning
 Limited scalability
 Helps across the board
 Sharding
 Scales reads, writes, too much data and helps with availability
 Those 3 techniques can be combined
Database scaling

28
 Usually required at large scale
 Key-Value stores
 Set(key, value[, TTL])
 Get(key)
 Delete(key)
 Different levels
 Client side (e.g. in the browser in JS)
 In the WebServer (e.g. APC)
 Distributed cache (e.g. Redis, Memcached)
Caching application data

29
 E.g. APC (Alternative PHP Cache)
 Very fast
 Duplicated caching between web servers
 Expensive to invalidate
 Use sparingly, mostly for global data
Caching in the web server

30
 Examples:
 Redis
 Memcached (+ McRouter or libmemcached)
 One or more cache servers, shared use between clients
 Network latency
Distributed cache

31
Features to consider:
 Replication
 Partitioning
 Separate pools
 Persistence
 Atomic operations
Distributed cache

32
 When the value is no longer valid, usually just delete the key
 Example:
user_friends:100 => ‘John X, Bob Y, Anne Z’
 Need to invalidate when:
 The user adds or removes friends
 A friend removes him as a friend
 A friend changes his name
 Can you tolerate temporary inconsistencies?
Cache invalidation

33
 What happens if you change the structure of the values? Example:
(old) user_friends:100 => ‘John X, Bob Y, Anne Z’
(new) user_friends:100 => ‘1:John X, 25:Bob Y, 37:Anne Z’
 New code breaks with old style keys
 Old code breaks with new style keys
 Solution: use versions:
(old) user_friends:100:1 => ‘John X, Bob Y, Anne Z’
(new) user_friends:100:2 => ‘1:John X, 25:Bob Y, 37:Anne Z’
Cache versioning

35
Objectives:
 A/B testing
 Quickly revert it if needed
 Protect infrastructure
 Ease of development
Introducing new features

36
Some possibilities:
1. Development branch
2. Feature toggle
3. Percentage Rollout
4. Advanced Rollout

37
 New branch for the feature, merge when finished
 Can be fine in the early stages
 No extra setup or complexity
 Long living branch, may be hard to merge
Development Branch

38
 Can be changed at run time (console or configuration)
 Should distinguish prod from testing
 Allows for intermediate commits
 Code structure:
if (feature_enabled(‘homepage_redesign’)) {
new_homepage();
} else {
old_homepage();
}
Feature Toggle

39
 Dynamically control the percentage of users
for a feature
 When increasing the percentage, should
include previous users
 Code structure:
if (feature_enabled(‘homepage_redesign’, $user_id)) {
new_homepage();
} else {
old_homepage();
}
Percentage Rollout

40
Turn on/off features for a percentage of users that:
 Are employees
 Are in another rollout group
 Use a certain language
 Are in a certain country
 Individually whitelist or blacklist people
Advanced Rollout

41
 Some frameworks to check out:
 Swivel
 Opensoft/rollout
 LaunchDarkly
 Don’t forget to clean up the old code paths

42
Contact Information
amarcu@gmail.com
/alejandro.marcu
/alejandromarcu
@AlejandroMarcu
/in/alejandromarcu

Scaling your website

More Related Content

What's hot

Viewers also liked

Similar to Scaling your website

Recently uploaded

Scaling your website