2. 2
Started programming Logo at
8 years old
Then moved to Basic, Turbo
Pascal, C++, Java
2001 – 2004 Various
programming jobs in
Argentina
2004 – 2008: TopCoder
2009 – 2015: Facebook
Alejandro Marcu
3. 3
Scalable architecture
Scaling the database
Caching
Introducing new features
What You Will Learn Today
5. 5
Single Server
Hosted or in the cloud
Web App: Apache/Nginx +
PHP
DB: MySql, MongoDB, etc.
Cache: Memcache, Redis Web App
CacheDB
Server
User
6. 6
More RAM
More cores or faster CPU
SSD
RAID
Network Interfaces
Scaling Vertically
7. 7
Functional Partitioning
Servers can have different
hardware specs
More latency
Limited growth
Server 1
Server 3Server 2
Web App
CacheDB
Data Center
User
8. 8
Splitting the Web App
Web Front End should be a
thin presentation layer
Services
Just another class
Remote over SOAP, REST,
Thrift
Start simple, plan for scale
Web Front End
Service 1
DB
Service 2 Service n
Back End
Cache
iOS
App
Android
App
9. 9
Functional Partitioning
Back end servers can have
one or more services
Some services can be in
more than one server
Service 1 Service n
Back End
Server 4 Server k
Server 1
Server 3Server 2
Web Front End
CacheDB
Data Center
User
10. 10
Don’t store anything locally
Use external storage (e.g. databases)
Can use local caching
Stateless Services
11. 11
HTTP Session
Cookies
External Data Store
Uploaded Files
DFS: GFS, HDFS, ClusterFS
Amazon S3
Stateless Front End
12. 12
Multiple Front End Servers
Load Balancer:
Cloud based (Amazon ELB)
Software (NGINX, HAProxy)
Hardware (BIG-IP,
Netscaler)
Load
Balancer
Service 1 Service n
Back End
CacheDB
Data Center
User
Web FE 1
Front End
Web FE k
13. 13
Caching static files
Files that are the same on
each request, e.g. jpg, png,
css, js, mp3, etc
Reverse HTTP Proxy
Load balancers usually
provide this functionality
CDN (Content Delivery
Network)
E.g. Akamai, Amazon
Cloudfront
Pay for usage
Multiple locations
User CDN
Data Center
static
content
dynamic
content
14. 14
Advantages
Lower latency for users
Reduced disaster risk
Economic opportunities
Challenges
Consistency
Latency between data centers
Bandwidth between data centers
Multiple Data Centers
16. 16
Too much data
Too many reads
Too many writes
Want higher availability
Scaling relational databases
17. 17
Replication
Usually much more reads
than writes
Higher availability
Read after write can be
wrong
Master
Slave Slave
R/W
R
DB clients
Binlogs
19. 19
Sharding
Tables are split into multiple
DBs
Sharding key used to decide
which db, e.g. id
Sharding function, e.g.
db(id) = (id % 2) + 1
Searching becomes more
complicated
id name
1 John
3 Jack
5 Anne
id name
2 Louise
4 Bob
6 Marie
DB 1
DB 2
20. 20
Sharding
E.g., add an extra db
New sharding function:
db(id) = (id % 3) + 1
Conclusion: modulo is not a
good sharding function
id name
1 John
3 Jack
5 Anne
id name
2 Louise
4 Bob
6 Marie
DB 1
DB 2
id name
1 John
4 Bob
DB 1
id name
2 Louise
5 Anne
DB 2
id name
3 Jack
6 Marie
DB 3
21. 21
Consistent Sharding
Consistent sharding needs
less reallocations id name
1 John
3 Jack
5 Anne
id name
2 Louise
4 Bob
6 Marie
DB 1
DB 2
id name
1 John
3 Jack
DB 1
id name
2 Louise
4 Bob
DB 2
id name
5 Anne
6 Marie
DB 3
22. 22
Sharding
Create many logical DBs
Distribute them across
servers
Server 1
DB 1
DB 2
…
…
DB 16
Server 2
DB 17
DB 18
…
…
DB 32
23. 23
Sharding
Re-distribute DBs when
needed
Need a function to map db to
server, can be a
configuration
Server 1
DB 1
DB 2
…
…
DB 16
Server 2
DB 17
DB 18
…
…
DB 24
Server 3
DB 25
DB 18
…
…
DB 32
24. 24
Sharding colocation
Put owned data in the same
table (e.g. shard by user_id
in post table)
Can execute joins
user
id name
1 John
3 Jack
5 Anne
id name
2 Louise
4 Bob
6 Marie
DB 1
DB 2
user
post
id user_id text
100 1 …
125 1 …
180 3 …
post
id user_id text
143 2 …
110 6 …
175 6 …
25. 25
Sharding fan-out
Many-to-many relationships
are spread out
To get friend’s names:
Get ids
Group by db
Query on each db
Gets worse with more dbs
Caching helps a lot
Needs inverse entries
user
id name
1 John
3 Jack
5 Anne
id name
2 Louise
4 Bob
6 Marie
DB 1
DB 2
user
friend
id1 id2
1 2
1 4
3 4
friend
id1 id2
2 1
4 1
4 3
26. 26
Replication
Scales reads, higher availability
Functional partitioning
Limited scalability
Helps across the board
Sharding
Scales reads, writes, too much data and helps with availability
Those 3 techniques can be combined
Database scaling
28. 28
Usually required at large scale
Key-Value stores
Set(key, value[, TTL])
Get(key)
Delete(key)
Different levels
Client side (e.g. in the browser in JS)
In the WebServer (e.g. APC)
Distributed cache (e.g. Redis, Memcached)
Caching application data
29. 29
E.g. APC (Alternative PHP Cache)
Very fast
Duplicated caching between web servers
Expensive to invalidate
Use sparingly, mostly for global data
Caching in the web server
30. 30
Examples:
Redis
Memcached (+ McRouter or libmemcached)
One or more cache servers, shared use between clients
Network latency
Distributed cache
31. 31
Features to consider:
Replication
Partitioning
Separate pools
Persistence
Atomic operations
Distributed cache
32. 32
When the value is no longer valid, usually just delete the key
Example:
user_friends:100 => ‘John X, Bob Y, Anne Z’
Need to invalidate when:
The user adds or removes friends
A friend removes him as a friend
A friend changes his name
Can you tolerate temporary inconsistencies?
Cache invalidation
33. 33
What happens if you change the structure of the values? Example:
(old) user_friends:100 => ‘John X, Bob Y, Anne Z’
(new) user_friends:100 => ‘1:John X, 25:Bob Y, 37:Anne Z’
New code breaks with old style keys
Old code breaks with new style keys
Solution: use versions:
(old) user_friends:100:1 => ‘John X, Bob Y, Anne Z’
(new) user_friends:100:2 => ‘1:John X, 25:Bob Y, 37:Anne Z’
Cache versioning
37. 37
New branch for the feature, merge when finished
Can be fine in the early stages
No extra setup or complexity
Long living branch, may be hard to merge
Development Branch
38. 38
Can be changed at run time (console or configuration)
Should distinguish prod from testing
Allows for intermediate commits
Code structure:
if (feature_enabled(‘homepage_redesign’)) {
new_homepage();
} else {
old_homepage();
}
Feature Toggle
39. 39
Dynamically control the percentage of users
for a feature
When increasing the percentage, should
include previous users
Code structure:
if (feature_enabled(‘homepage_redesign’, $user_id)) {
new_homepage();
} else {
old_homepage();
}
Percentage Rollout
40. 40
Turn on/off features for a percentage of users that:
Are employees
Are in another rollout group
Use a certain language
Are in a certain country
Individually whitelist or blacklist people
Advanced Rollout
41. 41
Some frameworks to check out:
Swivel
Opensoft/rollout
LaunchDarkly
Don’t forget to clean up the old code paths
Introducing new features