Scaling your website

SCALING YOUR WEBSITE
Alejandro Marcu
Dutch PHP Conference 2016

2
 Started programming Logo at
8 years old
 Then moved to Basic, Turbo
Pascal, C++, Java
 2001 – 2004 Various
programming jobs in
Argentina
 2004 – 2008: TopCoder
 2009 – 2015: Facebook
Alejandro Marcu

3
 Scalable architecture
 Scaling the database
 Caching
 Introducing new features
What You Will Learn Today

5
Single Server
 Hosted or in the cloud
 Web App: Apache/Nginx +
PHP
 DB: MySql, MongoDB, etc.
 Cache: Memcache, Redis Web App
CacheDB
Server
User

6
 More RAM
 More cores or faster CPU
 SSD
 RAID
 Network Interfaces
Scaling Vertically

7
Functional Partitioning
 Servers can have different
hardware specs
 More latency
 Limited growth
Server 1
Server 3Server 2
Web App
CacheDB
Data Center
User

8
Splitting the Web App
 Web Front End should be a
thin presentation layer
 Services
 Just another class
 Remote over SOAP, REST,
Thrift
 Start simple, plan for scale
Web Front End
Service 1
DB
Service 2 Service n
Back End
Cache
iOS
App
Android
App

9
 Back end servers can have
one or more services
 Some services can be in
more than one server
Service 1 Service n
Back End
Server 4 Server k
Server 1
Server 3Server 2
Web Front End
CacheDB
Data Center
User

10
 Don’t store anything locally
 Use external storage (e.g. databases)
 Can use local caching
Stateless Services

11
 HTTP Session
 Cookies
 External Data Store
 Uploaded Files
 DFS: GFS, HDFS, ClusterFS
 Amazon S3
Stateless Front End

12
Multiple Front End Servers
Load Balancer:
 Cloud based (Amazon ELB)
 Software (NGINX, HAProxy)
 Hardware (BIG-IP,
Netscaler)
Load
Balancer
Service 1 Service n
Back End
CacheDB
Data Center
User
Web FE 1
Front End
Web FE k

13
Caching static files
 Files that are the same on
each request, e.g. jpg, png,
css, js, mp3, etc
 Reverse HTTP Proxy
 Load balancers usually
provide this functionality
 CDN (Content Delivery
Network)
 E.g. Akamai, Amazon
Cloudfront
 Pay for usage
 Multiple locations
User CDN
Data Center
static
content
dynamic
content

14
 Advantages
 Lower latency for users
 Reduced disaster risk
 Economic opportunities
 Challenges
 Consistency
 Latency between data centers
 Bandwidth between data centers
Multiple Data Centers

16
 Too much data
 Too many reads
 Too many writes
 Want higher availability
Scaling relational databases

17
Replication
 Usually much more reads
than writes
 Higher availability
 Read after write can be
wrong
Master
Slave Slave
R/W
R
DB clients
Binlogs

18
 Limited growth
 Can separate unrelated
functionality
User
Post
Payment
DB 1
DB 2

19
Sharding
 Tables are split into multiple
DBs
 Sharding key used to decide
which db, e.g. id
 Sharding function, e.g.
db(id) = (id % 2) + 1
 Searching becomes more
complicated
id name
1 John
3 Jack
5 Anne
id name
2 Louise
4 Bob
6 Marie
DB 1
DB 2

20
Sharding
 E.g., add an extra db
 New sharding function:
db(id) = (id % 3) + 1
 Conclusion: modulo is not a
good sharding function
id name
1 John
3 Jack
5 Anne
id name
2 Louise
4 Bob
6 Marie
DB 1
DB 2
id name
1 John
4 Bob
DB 1
id name
2 Louise
5 Anne
DB 2
id name
3 Jack
6 Marie
DB 3

21
Consistent Sharding
 Consistent sharding needs
less reallocations id name
1 John
3 Jack
5 Anne
id name
2 Louise
4 Bob
6 Marie
DB 1
DB 2
id name
1 John
3 Jack
DB 1
id name
2 Louise
4 Bob
DB 2
id name
5 Anne
6 Marie
DB 3

22
Sharding
 Create many logical DBs
 Distribute them across
servers
Server 1
DB 1
DB 2
…
…
DB 16
Server 2
DB 17
DB 18
…
…
DB 32

23
Sharding
 Re-distribute DBs when
needed
 Need a function to map db to
server, can be a
configuration
Server 1
DB 1
DB 2
…
…
DB 16
Server 2
DB 17
DB 18
…
…
DB 24
Server 3
DB 25
DB 18
…
…
DB 32

24
Sharding colocation
 Put owned data in the same
table (e.g. shard by user_id
in post table)
 Can execute joins
user
id name
1 John
3 Jack
5 Anne
id name
2 Louise
4 Bob
6 Marie
DB 1
DB 2
user
post
id user_id text
100 1 …
125 1 …
180 3 …
post
id user_id text
143 2 …
110 6 …
175 6 …

25
Sharding fan-out
 Many-to-many relationships
are spread out
 To get friend’s names:
 Get ids
 Group by db
 Query on each db
 Gets worse with more dbs
 Caching helps a lot
 Needs inverse entries
user
id name
1 John
3 Jack
5 Anne
id name
2 Louise
4 Bob
6 Marie
DB 1
DB 2
user
friend
id1 id2
1 2
1 4
3 4
friend
id1 id2
2 1
4 1
4 3

26
 Replication
 Scales reads, higher availability
 Functional partitioning
 Limited scalability
 Helps across the board
 Sharding
 Scales reads, writes, too much data and helps with availability
 Those 3 techniques can be combined
Database scaling

28
 Usually required at large scale
 Key-Value stores
 Set(key, value[, TTL])
 Get(key)
 Delete(key)
 Different levels
 Client side (e.g. in the browser in JS)
 In the WebServer (e.g. APC)
 Distributed cache (e.g. Redis, Memcached)
Caching application data

29
 E.g. APC (Alternative PHP Cache)
 Very fast
 Duplicated caching between web servers
 Expensive to invalidate
 Use sparingly, mostly for global data
Caching in the web server

30
 Examples:
 Redis
 Memcached (+ McRouter or libmemcached)
 One or more cache servers, shared use between clients
 Network latency
Distributed cache

31
Features to consider:
 Replication
 Partitioning
 Separate pools
 Persistence
 Atomic operations
Distributed cache

32
 When the value is no longer valid, usually just delete the key
 Example:
user_friends:100 => ‘John X, Bob Y, Anne Z’
 Need to invalidate when:
 The user adds or removes friends
 A friend removes him as a friend
 A friend changes his name
 Can you tolerate temporary inconsistencies?
Cache invalidation

33
 What happens if you change the structure of the values? Example:
(old) user_friends:100 => ‘John X, Bob Y, Anne Z’
(new) user_friends:100 => ‘1:John X, 25:Bob Y, 37:Anne Z’
 New code breaks with old style keys
 Old code breaks with new style keys
 Solution: use versions:
(old) user_friends:100:1 => ‘John X, Bob Y, Anne Z’
(new) user_friends:100:2 => ‘1:John X, 25:Bob Y, 37:Anne Z’
Cache versioning

35
Objectives:
 A/B testing
 Quickly revert it if needed
 Protect infrastructure
 Ease of development
Introducing new features

36
Some possibilities:
1. Development branch
2. Feature toggle
3. Percentage Rollout
4. Advanced Rollout

37
 New branch for the feature, merge when finished
 Can be fine in the early stages
 No extra setup or complexity
 Long living branch, may be hard to merge
Development Branch

38
 Can be changed at run time (console or configuration)
 Should distinguish prod from testing
 Allows for intermediate commits
 Code structure:
if (feature_enabled(‘homepage_redesign’)) {
new_homepage();
} else {
old_homepage();
}
Feature Toggle

39
 Dynamically control the percentage of users
for a feature
 When increasing the percentage, should
include previous users
 Code structure:
if (feature_enabled(‘homepage_redesign’, $user_id)) {
new_homepage();
} else {
old_homepage();
}
Percentage Rollout

40
Turn on/off features for a percentage of users that:
 Are employees
 Are in another rollout group
 Use a certain language
 Are in a certain country
 Individually whitelist or blacklist people
Advanced Rollout

41
 Some frameworks to check out:
 Swivel
 Opensoft/rollout
 LaunchDarkly
 Don’t forget to clean up the old code paths

42
Contact Information
amarcu@gmail.com
/alejandro.marcu
/alejandromarcu
@AlejandroMarcu
/in/alejandromarcu

Scaling your website

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (11)

Similar to Scaling your website

Similar to Scaling your website (20)

Recently uploaded

Recently uploaded (20)

Scaling your website