4. 1. Insights into the Slack backend
2. Small, easy performance improvements
3. Slightly larger, more difficult improvements
4. Thinking at scale
5. Developing a long term plan
> Table of contents
5. Once upon a time …
some folks created a messaging
platform called .
> Backend architecture overview
11. > Performance monitoring > use cases
• A little obvious:
• Track the performance of an existing codepath
• Track the performance of a new codepath
• Keep an eye on overall application
12. > Performance monitoring > use cases
Given you have access to a system that aggregates
data for consumption later, you just need to:
1. import a library
2. call a function
13. > Performance monitoring > in practice
loadlib(‘statsd’);
function do_an_expensive_thing($lots_of_data){
$start = time();
foreach($lots_of_data as $chunk){
existing_function_one($chunk);
existing_function_two();
statsd_count("$env.expensive_thing.loop_count");
}
$end = time();
statsd_timing("$env.expensive_thing.loop_timing", $end-$start);
return;
}
17. > Performance monitoring > trade-offs
Cons
• Storing lots of data is expensive
• Knowing what specificity to monitor is tough
Pros
• You’re better off having more performance data than
not enough
• Easy to set up alerting in the background
19. • Used by most companies with publicly-
facing APIs
• Prevent servers from coming under
unhealthy load
• Prevent spamming
> Add rate limits > use cases
20. > Add rate limits > in practice
109.29.287
POST /invite
+1
109.29.287
POST /invite
101 4mes/1 min
429: Too
Many Requests
21. memcached
> Add rate limits > in practice
xoxp-8314999681-…
POST chat.postMessage
+1
hash(xoxp-8314999681-…)
chat.postMessage
101 4mes/1 min
429: Too
Many Requests
Retry AMer: 1 sec
22. > Add rate limits > at Slack
• Externally (API) & internally (enqueuing jobs)
• Rate limit individual methods keyed by some
entity (user, team, channel, globally, etc.)
24. > Add rate limits > trade-offs
Cons
• Picking the right number is difficult
• Risk potentially annoying your users
Pros
• Always safer to have them in place
• Easily tweaked to handle growing # of users
• Easier to opt users into rate limits from the beginning
27. function do_a_thing_for_many_channels($channels){
$all_members = array();
$channel_ids = array_column($channels, 'id');
$chunk = array_slice($channel_ids, 0, 1000);
return query("SELECT * FROM members
WHERE channel_id IN $chunk");
}
> Paginate > in practice
28. > Cheaper by the dozen > at Slack
https://api.slack.com/methods/users.list
29. Cons
• Picking the right number is hard, but thankfully not a
big deal
• Clients may need to make multiple calls to get full lists
Pros
• Always safer to have
• Easier to opt users into pagination from the beginning
> Paginate > trade-offs
31. Generally, it is best to cache one or a combination of:
• data that changes very little
• data that is expensive to calculate
• data that is accessed often
• data derived from multiple sources
> Cache, cache, cache > use cases
32. How old is my mom?
> Cache, cache, cache > use cases
38. Cons
• Invalidation is hard
• Storing lots of data in a remote caching tier is expensive
Pros
• Generally, it’s an easy, quick win to store reasonable
amounts of data within the context of your request
• Although speed of a request might not increase, you’re
saving load on the database
> Cache, cache, cache > trade-offs
40. > Make it asynchronous > use cases
• You have a task that
• can be split into independent subtasks
• doesn’t need to complete right away
41. > Make it asynchronous > use cases
Making a pie
42. >
just me
1. prepare the crust 2. prepare the filling 3. combine 4. bake
with some help
> Make it asynchronous > use cases
1. prepare the crust
1. prepare the filling
2. combine 3. bake
43. async function curl_A(): Awaitable<string> {
$x = await HHAsiocurl_exec("http://example.com/");
return $x;
}
async function curl_B(): Awaitable<string> {
$y = await HHAsiocurl_exec("http://example.net/");
return $y;
}
async function async_curl(): Awaitable<void> {
$start = microtime(true);
list($a, $b) = await HHAsiov(array(curl_A(), curl_B()));
$end = microtime(true);
echo "Total time taken: " . strval($end - $start) . " seconds" . PHP_EOL;
}
HHAsiojoin(async_curl());
> Make it asynchronous > in practice > language constructs
44. function do_something_asynchronously($something){
# Get some data.
$other_data = get_more_than_something($something);
job_queue_enqueue('async_task', 'special_queue', array(
'something' => $something,
'other_data' => $other_data,
));
return;
}
> Make it asynchronous > in practice > job queue
45. >> Make it asynchronous > at Slack
channels.archive
46. function channel_archive($channel, $archiver){
# Get members
foreach($members as $member){
send_message($member, "Channel $channel['name'] was archived.");
remove_member($member, $channel);
send_client_event($member, "leave_channel");
}
send_message($archiver, "Channel $channel['name'] was archived.");
return;
}
> Make it asynchronous > at Slack
47. >
What happens when
we archive a channel
with 500 users?
> Make it asynchronous > at Slack
48. function channel_archive($channel, $archiver){
schedule_job(($channel, $archiver) ==> channel_archive_process_members(
$channel, $archiver));
send_message($archiver, "Channel $channel['name'] is being archived.");
return;
}
function channel_archive_process_members($channel, $archiver){
# Get members
foreach($members as $member){
send_message($member, "Channel $channel['name'] was archived.");
remove_member($member, $channel);
send_client_event($member, "leave_channel");
}
send_message($archiver, "Channel $channel['name'] was archived.");
return;
}
> Make it asynchronous > at Slack
49. What happens when
we archive a channel
with 17,000 users?
> Make it asynchronous > at Slack
50. function channel_archive($channel){
schedule_job(($channel, $archiver) ==> channel_archive_process_members(
$channel, $archiver));
send_message($archiver, "Channel $channel['name'] is being archived.");
return;
}
function channel_archive_process_members($channel, $archiver){
# Get members
foreach($members as $member){
schedule_job(($channel, $member) ==> channel_archive_process_single_member(
$channel, $member), 200);
}
return;
}
function channel_archive_process_single_member($member){
send_message($member, "Channel $channel['name'] was archived.");
remove_member($member, $channel);
send_client_event($member, "leave_channel");
return;
}
> Make it asynchronous > at Slack
51. Cons
• Unexpected behavior due to ordering
• Fault tolerance becomes tricky
• Too many jobs can cause overflowing queues
Pros
• If you know the operation could take a while and the
result of the operation has no bearing on subsequent
code, then offload to a queue.
> Make it asynchronous > at Slack
58. 1. Absolutely necessary
2. Popular features
3. Mostly everything else
> Tiered degradation plan > in practice
59. • Set job queue priorities
• Ensure clients exponentially back-off
struggling API endpoints
• Use internal rate limits
• Respond appropriately to error codes
> Tiered degradation plan > in practice
60. We’ve yet to
figure this out
ourselves!
> Tiered degradation plan > at Slack
61. Cons
• Building out a tiered degradation plan takes lots of effort
from both product and engineering folks across the entire
company.
Pros
• If it’s already in place, understanding where your feature fits
in is the key to ensuring the best experience for your users.
> Tiered degradation plan > use cases