Highly concurrent yet natural
programming
mefyl
quentin.hocquet@infinit.io
Version 1.2
Infinit & me
Me
• Quentin "mefyl" Hocquet
• Epita CSI (LRDE) 2008.
• Ex Gostai
• Into language theory
• Joined Infinit early two years ago.
Infinit & me
Me
• Quentin "mefyl" Hocquet
• Epita CSI (LRDE) 2008.
• Ex Gostai
• Into language theory
• Joined Infinit early two years ago.
Infinit
• Founded my Julien "mycure" Quintard, Epita SRS 2007
• Based on his thesis at Cambridge
• Decentralized filesystem in byzantine environment
• Frontend: file transfer application based on the technology.
• Strong technical culture
Concurrent and parallel
programming
Know the difference
Parallel programming
Aims at running two tasks simultaneously. It is a matter of performances.
Concurrent programming
Aims at running two tasks without inter-blocking. It is a matter of behavior.
Task 1 Task 2
Know the difference
Task 1 Task 2
Know the difference
Sequential
Task 1 Task 2
Know the difference
Parallel
Task 1 Task 2
Know the difference
Concurrent
Sequential Concurrent
Know the difference
Parallel
Sequential Concurrent
Know the difference
Parallel
Sequential Concurrent Parallel
CPU usage N N N
Execution time Long Short Shorter
Sequential Concurrent
Know the difference
Parallel
Sequential Concurrent Parallel
CPU usage N N N
Execution time Long Short Shorter
Need to run in parallel No No Yes
TV
Commercials
TV
Peeling
Some real life examples
You are the CPU. You want to:
• Watch a film on TV.
• Peel potatoes.
Sequential
TV
Commercials
TV
Peeling
Concurrent
TV
Peeling
TV
Peeling
Some real life examples
Parallel
TV Peeling
Commercials
TV
Load
Unload
Load
Unload
Some real life examples
You are the CPU. You want to:
• Do the laundry.
• Do the dishes.
Sequential
Load
Unload
Load
Unload
Concurrent
Load
Load
Unload
Unload
Some real life examples
Parallel
Load Load
Unload Unload
Some programming examples
Video encoding: encode a raw 2GB raw file to mp4.
• CPU bound.
• File chunks can be encoded separately and then merged later.
Parallel
Encode
first half
Encode
second half
Sequential
Concurrent
Encode first
half
Encode
second half
Some programming examples
Video encoding: encode a raw 2GB raw file to mp4.
• CPU bound.
• File chunks can be encoded separately and then merged later.
Parallelism is a plus, concurrency doesn't apply.
Some programming examples
An IRC server: handle up to 50k IRC users chatting.
• IO bound.
• A huge number of clients that must be handled concurrently and mostly
waiting.
Concurrent Parallel
Some programming examples
An IRC server: handle up to 50k IRC users chatting.
• IO bound.
• A huge number of clients that must be handled concurrently and mostly
waiting.
Concurrency is needed, parallelism is superfluous.
Know the difference
Parallelism
• Is never needed for correctness.
• Is about performances, not correct behavior.
• Is about exploiting multi-core and multi-CPU architectures.
Concurrent programming
• Can be needed for correctness.
• Is about correct behavior, sometimes about performances too.
• Is about multiple threads being responsive in concurrent.
Know the difference
Parallelism
• Is never needed for correctness.
• Is about performances, not correct behavior.
• Is about exploiting multi-core and multi-CPU architectures.
Concurrent programming
• Can be needed for correctness.
• Is about correct behavior, sometimes about performances too.
• Is about multiple threads being responsive in concurrent.
A good video encoding app:
• Encodes 4 times faster on a 4-core CPU. That's parallelism.
Know the difference
Parallelism
• Is never needed for correctness.
• Is about performances, not correct behavior.
• Is about exploiting multi-core and multi-CPU architectures.
Concurrent programming
• Can be needed for correctness.
• Is about correct behavior, sometimes about performances too.
• Is about multiple threads being responsive in concurrent.
A good video encoding app:
• Encodes 4 times faster on a 4-core CPU. That's parallelism.
• Has a responsive GUI while encoding. That's concurrency.
Who's best ?
If you are parallel, you are concurrent. So why bother ?
Who's best ?
If you are parallel, you are concurrent. So why bother ?
• Being parallel is much, much more difficult. That's time, money and
programmer misery.
Who's best ?
If you are parallel, you are concurrent. So why bother ?
• Being parallel is much, much more difficult. That's time, money and
programmer misery.
• You can't be efficiently parallel past your hardware limit. Those are system
calls, captain.
Threads, callbacks
So, how do you write an echo server ?
The sequential echo server
TCPServer server;
server.listen(4242);
while (true)
{
TCPSocket client = server.accept();
}
The sequential echo server
TCPServer server;
server.listen(4242);
while (true)
{
TCPSocket client = server.accept();
while (true)
{
std::string line = client.read_until("n");
client.send(line);
}
}
The sequential echo server
TCPServer server;
server.listen(4242);
while (true)
{
TCPSocket client = server.accept();
try
{
while (true)
{
std::string line = client.read_until("n");
client.send(line);
}
}
catch (ConnectionClosed const&)
{}
}
The sequential echo server
TCPServer server;
server.listen(4242);
while (true)
{
TCPSocket client = server.accept();
serve_client(client);
}
The sequential echo server
TCPServer server;
server.listen(4242);
while (true)
{
TCPSocket client = server.accept();
serve_client(client);
}
• Dead simple: you got it instantly. It's natural.
• But wrong: we handle only one client at a time.
• We need ...
The sequential echo server
TCPServer server;
server.listen(4242);
while (true)
{
TCPSocket client = server.accept();
serve_client(client);
}
• Dead simple: you got it instantly. It's natural.
• But wrong: we handle only one client at a time.
• We need ... concurrency !
The parallel echo server
TCPServer server;
server.listen(4242);
while (true)
{
TCPSocket client = server.accept();
serve_client(client);
}
The parallel echo server
TCPServer server;
server.listen(4242);
std::vector<std::thread> threads;
while (true)
{
TCPSocket client = server.accept();
std::thread client_thread(
[&]
{
serve_client(client);
});
client_thread.run();
vectors.push_back(std::move(client_thread));
}
The parallel echo server
TCPServer server;
server.listen(4242);
std::vector<std::thread> threads;
while (true)
{
TCPSocket client = server.accept();
std::thread client_thread(
[&]
{
serve_client(client);
});
client_thread.run();
vectors.push_back(std::move(client_thread));
}
• Almost as simple and still natural,
• To add the concurrency property, we just added a concurrency construct
to the existing.
But parallelism is too much
• Not scalable: you can't run 50k threads.
But parallelism is too much
• Not scalable: you can't run 50k threads.
• Induces unwanted complexity: race conditions.
But parallelism is too much
• Not scalable: you can't run 50k threads.
• Induces unwanted complexity: race conditions.
int line_count = 0;
while (true)
{
TCPSocket client = server.accept();
while (true)
{
std::string line = client.read_until("n");
client.send(line);
++line_count;
}
}
But parallelism is too much
• Not scalable: you can't run 50k threads.
• Induces unwanted complexity: race conditions.
int line_count = 0;
while (true)
{
TCPSocket client = server.accept();
std::thread client_thread(
[&]
{
while (true)
{
std::string line = client.read_until("n");
client.send(line);
++line_count;
}
});
}
We need concurrency without threads
We need to accept, read and write to socket without threads so without
blocking.
We need concurrency without threads
We need to accept, read and write to socket without threads so without
blocking.
• Use select to monitor all sockets at once.
• Register actions to be done when something is ready.
• Wake up only when something needs to be performed.
We need concurrency without threads
We need to accept, read and write to socket without threads so without
blocking.
• Use select to monitor all sockets at once.
• Register actions to be done when something is ready.
• Wake up only when something needs to be performed.
This is abstracted with the reactor design pattern:
• libevent
• Boost ASIO
• Python Twisted
• ...
The callback-based echo server
Reactor reactor;
TCPServer server(reactor);
server.accept(&handle_connection);
reactor.run();
The callback-based echo server
Reactor reactor;
TCPServer server(reactor);
server.accept(&handle_connection);
reactor.run();
void
handle_connection(TCPSocket& client)
{
client.read_until("n", &handle_read);
}
The callback-based echo server
Reactor reactor;
TCPServer server(reactor);
server.accept(&handle_connection);
reactor.run();
void
handle_connection(TCPSocket& client);
void
handle_read(TCPSocket& c, std::string const& l, Error e)
{
if (!e)
c.send(l, &handle_sent);
}
The callback-based echo server
Reactor reactor;
TCPServer server(reactor);
server.accept(&handle_connection);
reactor.run();
void
handle_connection(TCPSocket& client);
void
handle_read(TCPSocket& c, std::string const& l, Error e);
void
handle_sent(TCPSocket& client, Error error)
{
if (!e)
client.read_until("n", &handle_read);
}
How do we feel now ?
• This one scales to thousands of client.
How do we feel now ?
• This one scales to thousands of client.
• Yet to add the concurrency property, we had to completely change the way
we think.
How do we feel now ?
• This one scales to thousands of client.
• Yet to add the concurrency property, we had to completely change the way
we think.
• A bit more verbose and complex, but nothing too bad ... right ?
Counting lines with threads
try
{
while (true)
{
std::string line = client.read_until("n");
client.send(line);
}
}
catch (ConnectionClosed const&)
{
}
Counting lines with threads
int lines_count = 0;
try
{
while (true)
{
std::string line = client.read_until("n");
++lines_count;
client.send(line);
}
}
catch (ConnectionClosed const&)
{
std::cerr << "Client sent " << lines_count << "linesn";
}
Counting lines with callbacks
void
handle_connection(TCPSocket& client)
{
int* count = new int(0);
client.read_until(
"n", std::bind(&handle_read, count));
}
Counting lines with callbacks
void
handle_connection(TCPSocket& client);
void
handle_read(TCPSocket& c, std::string const& l,
Error e, int* count)
{
if (e)
std::cerr << *count << std::endl;
else
c.send(l, std::bind(&handle_sent, count));
}
Counting lines with callbacks
void
handle_connection(TCPSocket& client);
void
handle_read(TCPSocket& c, std::string const& l,
Error e, int* count);
void
handle_sent(TCPSocket& client, Error error, int* count)
{
if (e)
std::cerr << *count << std::endl;
else
client.read_until(
"n", std::bind(&handle_read, count));
}
Callback-based programming considered harmful
• Code is structured with callbacks.
Callback-based programming considered harmful
• Code is structured with callbacks.
• Asynchronous operation break the flow arbitrarily.
Callback-based programming considered harmful
• Code is structured with callbacks.
• Asynchronous operation break the flow arbitrarily.
• You lose all syntactic scoping expression (local variables, closure,
exceptions, ...).
Callback-based programming considered harmful
• Code is structured with callbacks.
• Asynchronous operation break the flow arbitrarily.
• You lose all syntactic scoping expression (local variables, closure,
exceptions, ...).
• This is not natural. Damn, this is pretty much as bad as GOTO.
Are we screwed ?
Threads
• Respect your beloved semantic and expressiveness.
• Don't scale and introduce race conditions.
Are we screwed ?
Threads
• Respect your beloved semantic and expressiveness.
• Don't scale and introduce race conditions.
Callbacks
• Scale.
• Ruins your semantic. Painful to write, close to impossible to maintain.
Are we screwed ?
Threads
• Respect your beloved semantic and expressiveness.
• Don't scale and introduce race conditions.
Callbacks
• Scale.
• Ruins your semantic. Painful to write, close to impossible to maintain.
I lied when I said: we need concurrency without threads.
Are we screwed ?
Threads
• Respect your beloved semantic and expressiveness.
• Don't scale and introduce race conditions.
Callbacks
• Scale.
• Ruins your semantic. Painful to write, close to impossible to maintain.
I lied when I said: we need concurrency without threads.
We need concurrency without system threads.
Coroutines
Also known as:
• green threads
• userland threads
• fibers
• contexts
• ...
Coroutines
• Separate execution contexts like system threads.
• Userland: no need to ask the kernel.
• Non-parallel.
• Cooperative instead of preemptive: they yield to each other.
Coroutines
• Separate execution contexts like system threads.
• Userland: no need to ask the kernel.
• Non-parallel.
• Cooperative instead of preemptive: they yield to each other.
By building on top of that, we have:
• Scalability: no system thread involved.
• No arbitrary race-conditions: no parallelism.
• A stack, a context: the code is natural.
Coroutines-based scheduler
• Make a scheduler that holds coroutines .
• Embed a reactor in there.
• Write a neat Socket class.
Coroutines-based scheduler
• Make a scheduler that holds coroutines .
• Embed a reactor in there.
• Write a neat Socket class. When read, it:
◦ Unschedules itself.
◦ Asks the reactor to read
◦ Pass a callback to reschedule itself
◦ Yield control back.
Coroutines-based echo server
TCPServer server; server.listen(4242);
std::vector<Thread> threads;
int lines_count = 0;
while (true)
{
TCPSocket client = server.accept();
Thread t([client = std::move(client)] {
try
{
while (true)
{
++lines_count;
client.send(client.read_until("n"));
}
}
catch (ConnectionClosed const&) {}
});
threads.push_back(std::move(t));
}
What we built at Infinit: the reactor.
What we built at Infinit: the reactor.
• Coroutine scheduler: simple round robin
• Sleeping, waiting
• Timers
• Synchronization
• Mutexes, semaphores
• TCP networking
• SSL
• UPnP
• HTTP client (Curl based)
Coroutine scheduling
reactor::Scheduler sched;
reactor::Thread t1(sched,
[&]
{
print("Hello 1");
reactor::yield();
print("Bye 1");
});
reactor::Thread t2(sched,
[&]
{
print("Hello 2");
reactor::yield();
print("Bye 2");
});
);
sched.run();
Coroutine scheduling
reactor::Scheduler sched;
reactor::Thread t1(sched,
[&]
{
print("Hello 1");
reactor::yield();
print("Bye 1");
});
reactor::Thread t2(sched,
[&]
{
print("Hello 2");
reactor::yield();
print("Bye 2");
});
);
sched.run();
Hello 1
Hello 2
Bye 1
Bye 2
Sleeping and waiting
reactor::Thread t1(sched,
[&]
{
print("Hello 1");
reactor::sleep(500_ms);
print("Bye 1");
});
reactor::Thread t2(sched,
[&]
{
print("Hello 2");
reactor::yield();
print("World 2");
reactor::yield();
print("Bye 2");
});
);
Sleeping and waiting
reactor::Thread t1(sched,
[&]
{
print("Hello 1");
reactor::sleep(500_ms);
print("Bye 1");
});
reactor::Thread t2(sched,
[&]
{
print("Hello 2");
reactor::yield();
print("World 2");
reactor::yield();
print("Bye 2");
});
);
Hello 1
Hello 2
World 2
Bye 2
Sleeping and waiting
reactor::Thread t1(sched,
[&]
{
print("Hello 1");
reactor::sleep(500_ms);
print("Bye 1");
});
reactor::Thread t2(sched,
[&]
{
print("Hello 2");
reactor::yield();
print("World 2");
reactor::yield();
print("Bye 2");
});
);
Hello 1
Hello 2
World 2
Bye 2
Bye 1
Sleeping and waiting
reactor::Thread t1(sched,
[&]
{
print("Hello 1");
reactor::sleep(500_ms);
print("Bye 1");
});
reactor::Thread t2(sched,
[&]
{
print("Hello 2");
reactor::yield();
print("World 2");
reactor::wait(t1); // Wait
print("Bye 2");
});
);
Sleeping and waiting
reactor::Thread t1(sched,
[&]
{
print("Hello 1");
reactor::sleep(500_ms);
print("Bye 1");
});
reactor::Thread t2(sched,
[&]
{
print("Hello 2");
reactor::yield();
print("World 2");
reactor::wait(t1); // Wait
print("Bye 2");
});
);
Hello 1
Hello 2
World 2
Sleeping and waiting
reactor::Thread t1(sched,
[&]
{
print("Hello 1");
reactor::sleep(500_ms);
print("Bye 1");
});
reactor::Thread t2(sched,
[&]
{
print("Hello 2");
reactor::yield();
print("World 2");
reactor::wait(t1); // Wait
print("Bye 2");
});
);
Hello 1
Hello 2
World 2
Bye 1
Bye 2
Synchronization: signals
reactor::Signal task_available;
std::vector<Task> tasks;
reactor::Thread handler([&] {
while (true)
{
if (!tasks.empty())
{
std::vector mytasks = std::move(tasks);
for (auto& task: tasks)
; // Handle task
}
else
reactor::wait(task_available);
}
});
Synchronization: signals
reactor::Signal task_available;
std::vector<Task> tasks;
reactor::Thread handler([&] {
while (true)
{
if (!tasks.empty())
{
std::vector mytasks = std::move(tasks);
for (auto& task: tasks)
; // Handle task
}
else
reactor::wait(task_available);
}
});
tasks.push_back(...);
task_available.signal();
Synchronization: signals
reactor::Signal task_available;
std::vector<Task> tasks;
reactor::Thread handler([&] {
while (true)
{
if (!tasks.empty()) // 1
{
std::vector mytasks = std::move(tasks);
for (auto& task: tasks)
; // Handle task
}
else
reactor::wait(task_available); // 4
}
});
tasks.push_back(...); // 2
task_available.signal(); // 3
Synchronization: channels
reactor::Channel<Task> tasks;
reactor::Thread handler([&] {
while (true)
{
Task t = tasks.get();
// Handle task
}
});
tasks.put(...);
Mutexes
But you said no race conditions! You lied again!
Mutexes
But you said no race conditions! You lied again!
reactor::Thread t([&] {
while (true)
{
for (auto& socket: sockets)
socket.send("YO");
}
});
{
socket.push_back(...);
}
Mutexes
But you said no race conditions! You lied again!
reactor::Mutex mutex;
reactor::Thread t([&] {
while (true)
{
reactor::wait(mutex);
for (auto& socket: sockets)
socket.send("YO");
mutex.unlock();
}
});
{
reactor::wait(mutex);
socket.push_back(...);
mutex.unlock();
}
Mutexes
But you said no race conditions! You lied again!
reactor::Mutex mutex;
reactor::Thread t([&] {
while (true)
{
reactor::Lock lock(mutex);
for (auto& socket: sockets)
socket.send("YO");
}
});
{
reactor::Lock lock(mutex);
socket.push_back(...);
}
Networking: TCP
We saw a good deal of TCP networking:
try
{
reactor::TCPSocket socket("battle.net", 4242, 10_sec);
// ...
}
catch (reactor::network::ResolutionFailure const&)
{
// ...
}
catch (reactor::network::Timeout const&)
{
// ...
}
Networking: TCP
We saw a good deal of TCP networking:
void
serve(TCPSocket& client)
{
try
{
std::string auth = server.read_until("n", 10_sec);
if (!check_auth(auth))
// Impossible with callbacks
throw InvalidCredentials();
while (true) { ... }
}
catch (reactor::network::Timeout const&)
{}
}
Networking: SSL
Transparent client handshaking:
reactor::network::SSLSocket socket("localhost", 4242);
socket.write(...);
Networking: SSL
Transparent server handshaking:
reactor::network::SSLServer server(certificate, key);
server.listen(4242);
while (true)
{
auto socket = server.accept();
reactor::Thread([&] { ... });
}
Networking: SSL
Transparent server handshaking:
SSLSocket SSLServer::accept()
{
auto socket = this->_tcp_server.accept();
// SSL handshake
return socket
}
Networking: SSL
Transparent server handshaking:
reactor::Channel<SSLSocket> _sockets;
void SSLServer::_handshake_thread()
{
while (true)
{
auto socket = this->_tcp_server.accept();
// SSL handshake
this->_sockets.put(socket);
}
}
SSLSocket SSLServer::accept()
{
return this->_accepted.get;
}
Networking: SSL
Transparent server handshaking:
void SSLServer::_handshake_thread()
{
while (true)
{
auto socket = this->_tcp_server.accept();
reactor::Thread t(
[&]
{
// SSL handshake
this->_sockets.put(socket);
});
}
}
HTTP
std::string google = reactor::http::get("google.com");
HTTP
std::string google = reactor::http::get("google.com");
reactor::http::Request r("kissmetrics.com/api",
reactor::http::Method::PUT,
"application/json",
5_sec);
r.write("{ event: "login"}");
reactor::wait(r);
HTTP
std::string google = reactor::http::get("google.com");
reactor::http::Request r("kissmetrics.com/api",
reactor::http::Method::PUT,
"application/json",
5_sec);
r.write("{ event: "login"}");
reactor::wait(r);
• Chunking
• Cookies
• Custom headers
• Upload/download progress
• ... pretty much anything Curl supports (i.e., everything)
HTTP streaming
std::string content = reactor::http::get(
"my-api.infinit.io/transactions");
auto json = json::parse(content);
HTTP streaming
std::string content = reactor::http::get(
"my-api.infinit.io/transactions");
auto json = json::parse(content);
reactor::http::Request r(
"my-api.production.infinit.io/transactions");
assert(r.status() == reactor::http::Status::OK);
// JSON is parsed on the fly;
auto json = json::parse(r);
HTTP streaming
std::string content = reactor::http::get(
"my-api.infinit.io/transactions");
auto json = json::parse(content);
reactor::http::Request r(
"my-api.production.infinit.io/transactions");
assert(r.status() == reactor::http::Status::OK);
// JSON is parsed on the fly;
auto json = json::parse(r);
reactor::http::Request r(
"youtube.com/upload", http::reactor::Method::PUT);
std::ifstream input("~/A new hope - BrRIP.mp4");
std::copy(input, r);
Better concurrency: futures, ...
std::string transaction_id = reactor::http::put(
"my-api.production.infinit.io/transactions");
// Ask the user files to share.
reactor::http::post("my-api.infinit.io/transaction/", file_list);
std::string s3_token = reactor::http::get(
"s3.aws.amazon.com/get_token?key=...");
// Upload files to S3
Better concurrency: futures, ...
std::string transaction_id = reactor::http::put(
"my-api.production.infinit.io/transactions");
// Ask the user files to share.
reactor::http::post("my-api.infinit.io/transaction/", file_list);
std::string s3_token = reactor::http::get(
"s3.aws.amazon.com/get_token?key=...");
// Upload files to S3
reactor::http::Request transaction(
"my-api.production.infinit.io/transactions");
reactor::http::Request s3(
"s3.aws.amazon.com/get_token?key=...");
// Ask the user files to share.
auto transaction_id = transaction.content();
reactor::http::Request list(
"my-api.infinit.io/transaction/", file_list);
auto s3_token = transaction.content();
// Upload files to S3
Version 1
Wait meta
Ask files
Wait meta
Wait AWS
Version 2
Ask files
Better concurrency: futures, ...
Version 2
Ask files
How does it perform for us ?
• Notification server does perform:
◦ 10k clients per instance
◦ 0.01 load average
◦ 1G resident memory
◦ Cheap monocore 2.5 Ghz (EC2)
How does it perform for us ?
• Notification server does perform:
◦ 10k clients per instance
◦ 0.01 load average
◦ 1G resident memory
◦ Cheap monocore 2.5 Ghz (EC2)
• Life is so much better:
◦ Code is easy and pleasant to write and read
◦ Everything is maintainable
◦ Send metrics on login without slowdown? No biggie.
◦ Try connecting to several interfaces and keep the first to respond? No
biggie.
Questions ?

Highly concurrent yet natural programming

  • 1.
    Highly concurrent yetnatural programming mefyl quentin.hocquet@infinit.io Version 1.2
  • 2.
    Infinit & me Me •Quentin "mefyl" Hocquet • Epita CSI (LRDE) 2008. • Ex Gostai • Into language theory • Joined Infinit early two years ago.
  • 3.
    Infinit & me Me •Quentin "mefyl" Hocquet • Epita CSI (LRDE) 2008. • Ex Gostai • Into language theory • Joined Infinit early two years ago. Infinit • Founded my Julien "mycure" Quintard, Epita SRS 2007 • Based on his thesis at Cambridge • Decentralized filesystem in byzantine environment • Frontend: file transfer application based on the technology. • Strong technical culture
  • 4.
  • 5.
    Know the difference Parallelprogramming Aims at running two tasks simultaneously. It is a matter of performances. Concurrent programming Aims at running two tasks without inter-blocking. It is a matter of behavior.
  • 6.
    Task 1 Task2 Know the difference
  • 7.
    Task 1 Task2 Know the difference Sequential
  • 8.
    Task 1 Task2 Know the difference Parallel
  • 9.
    Task 1 Task2 Know the difference Concurrent
  • 10.
    Sequential Concurrent Know thedifference Parallel
  • 11.
    Sequential Concurrent Know thedifference Parallel Sequential Concurrent Parallel CPU usage N N N Execution time Long Short Shorter
  • 12.
    Sequential Concurrent Know thedifference Parallel Sequential Concurrent Parallel CPU usage N N N Execution time Long Short Shorter Need to run in parallel No No Yes
  • 13.
    TV Commercials TV Peeling Some real lifeexamples You are the CPU. You want to: • Watch a film on TV. • Peel potatoes.
  • 14.
  • 15.
    Load Unload Load Unload Some real lifeexamples You are the CPU. You want to: • Do the laundry. • Do the dishes.
  • 16.
  • 17.
    Some programming examples Videoencoding: encode a raw 2GB raw file to mp4. • CPU bound. • File chunks can be encoded separately and then merged later.
  • 18.
    Parallel Encode first half Encode second half Sequential Concurrent Encodefirst half Encode second half Some programming examples Video encoding: encode a raw 2GB raw file to mp4. • CPU bound. • File chunks can be encoded separately and then merged later. Parallelism is a plus, concurrency doesn't apply.
  • 19.
    Some programming examples AnIRC server: handle up to 50k IRC users chatting. • IO bound. • A huge number of clients that must be handled concurrently and mostly waiting.
  • 20.
    Concurrent Parallel Some programmingexamples An IRC server: handle up to 50k IRC users chatting. • IO bound. • A huge number of clients that must be handled concurrently and mostly waiting. Concurrency is needed, parallelism is superfluous.
  • 21.
    Know the difference Parallelism •Is never needed for correctness. • Is about performances, not correct behavior. • Is about exploiting multi-core and multi-CPU architectures. Concurrent programming • Can be needed for correctness. • Is about correct behavior, sometimes about performances too. • Is about multiple threads being responsive in concurrent.
  • 22.
    Know the difference Parallelism •Is never needed for correctness. • Is about performances, not correct behavior. • Is about exploiting multi-core and multi-CPU architectures. Concurrent programming • Can be needed for correctness. • Is about correct behavior, sometimes about performances too. • Is about multiple threads being responsive in concurrent. A good video encoding app: • Encodes 4 times faster on a 4-core CPU. That's parallelism.
  • 23.
    Know the difference Parallelism •Is never needed for correctness. • Is about performances, not correct behavior. • Is about exploiting multi-core and multi-CPU architectures. Concurrent programming • Can be needed for correctness. • Is about correct behavior, sometimes about performances too. • Is about multiple threads being responsive in concurrent. A good video encoding app: • Encodes 4 times faster on a 4-core CPU. That's parallelism. • Has a responsive GUI while encoding. That's concurrency.
  • 24.
    Who's best ? Ifyou are parallel, you are concurrent. So why bother ?
  • 25.
    Who's best ? Ifyou are parallel, you are concurrent. So why bother ? • Being parallel is much, much more difficult. That's time, money and programmer misery.
  • 26.
    Who's best ? Ifyou are parallel, you are concurrent. So why bother ? • Being parallel is much, much more difficult. That's time, money and programmer misery. • You can't be efficiently parallel past your hardware limit. Those are system calls, captain.
  • 27.
    Threads, callbacks So, howdo you write an echo server ?
  • 28.
    The sequential echoserver TCPServer server; server.listen(4242); while (true) { TCPSocket client = server.accept(); }
  • 29.
    The sequential echoserver TCPServer server; server.listen(4242); while (true) { TCPSocket client = server.accept(); while (true) { std::string line = client.read_until("n"); client.send(line); } }
  • 30.
    The sequential echoserver TCPServer server; server.listen(4242); while (true) { TCPSocket client = server.accept(); try { while (true) { std::string line = client.read_until("n"); client.send(line); } } catch (ConnectionClosed const&) {} }
  • 31.
    The sequential echoserver TCPServer server; server.listen(4242); while (true) { TCPSocket client = server.accept(); serve_client(client); }
  • 32.
    The sequential echoserver TCPServer server; server.listen(4242); while (true) { TCPSocket client = server.accept(); serve_client(client); } • Dead simple: you got it instantly. It's natural. • But wrong: we handle only one client at a time. • We need ...
  • 33.
    The sequential echoserver TCPServer server; server.listen(4242); while (true) { TCPSocket client = server.accept(); serve_client(client); } • Dead simple: you got it instantly. It's natural. • But wrong: we handle only one client at a time. • We need ... concurrency !
  • 34.
    The parallel echoserver TCPServer server; server.listen(4242); while (true) { TCPSocket client = server.accept(); serve_client(client); }
  • 35.
    The parallel echoserver TCPServer server; server.listen(4242); std::vector<std::thread> threads; while (true) { TCPSocket client = server.accept(); std::thread client_thread( [&] { serve_client(client); }); client_thread.run(); vectors.push_back(std::move(client_thread)); }
  • 36.
    The parallel echoserver TCPServer server; server.listen(4242); std::vector<std::thread> threads; while (true) { TCPSocket client = server.accept(); std::thread client_thread( [&] { serve_client(client); }); client_thread.run(); vectors.push_back(std::move(client_thread)); } • Almost as simple and still natural, • To add the concurrency property, we just added a concurrency construct to the existing.
  • 37.
    But parallelism istoo much • Not scalable: you can't run 50k threads.
  • 38.
    But parallelism istoo much • Not scalable: you can't run 50k threads. • Induces unwanted complexity: race conditions.
  • 39.
    But parallelism istoo much • Not scalable: you can't run 50k threads. • Induces unwanted complexity: race conditions. int line_count = 0; while (true) { TCPSocket client = server.accept(); while (true) { std::string line = client.read_until("n"); client.send(line); ++line_count; } }
  • 40.
    But parallelism istoo much • Not scalable: you can't run 50k threads. • Induces unwanted complexity: race conditions. int line_count = 0; while (true) { TCPSocket client = server.accept(); std::thread client_thread( [&] { while (true) { std::string line = client.read_until("n"); client.send(line); ++line_count; } }); }
  • 41.
    We need concurrencywithout threads We need to accept, read and write to socket without threads so without blocking.
  • 42.
    We need concurrencywithout threads We need to accept, read and write to socket without threads so without blocking. • Use select to monitor all sockets at once. • Register actions to be done when something is ready. • Wake up only when something needs to be performed.
  • 43.
    We need concurrencywithout threads We need to accept, read and write to socket without threads so without blocking. • Use select to monitor all sockets at once. • Register actions to be done when something is ready. • Wake up only when something needs to be performed. This is abstracted with the reactor design pattern: • libevent • Boost ASIO • Python Twisted • ...
  • 44.
    The callback-based echoserver Reactor reactor; TCPServer server(reactor); server.accept(&handle_connection); reactor.run();
  • 45.
    The callback-based echoserver Reactor reactor; TCPServer server(reactor); server.accept(&handle_connection); reactor.run(); void handle_connection(TCPSocket& client) { client.read_until("n", &handle_read); }
  • 46.
    The callback-based echoserver Reactor reactor; TCPServer server(reactor); server.accept(&handle_connection); reactor.run(); void handle_connection(TCPSocket& client); void handle_read(TCPSocket& c, std::string const& l, Error e) { if (!e) c.send(l, &handle_sent); }
  • 47.
    The callback-based echoserver Reactor reactor; TCPServer server(reactor); server.accept(&handle_connection); reactor.run(); void handle_connection(TCPSocket& client); void handle_read(TCPSocket& c, std::string const& l, Error e); void handle_sent(TCPSocket& client, Error error) { if (!e) client.read_until("n", &handle_read); }
  • 48.
    How do wefeel now ? • This one scales to thousands of client.
  • 49.
    How do wefeel now ? • This one scales to thousands of client. • Yet to add the concurrency property, we had to completely change the way we think.
  • 50.
    How do wefeel now ? • This one scales to thousands of client. • Yet to add the concurrency property, we had to completely change the way we think. • A bit more verbose and complex, but nothing too bad ... right ?
  • 51.
    Counting lines withthreads try { while (true) { std::string line = client.read_until("n"); client.send(line); } } catch (ConnectionClosed const&) { }
  • 52.
    Counting lines withthreads int lines_count = 0; try { while (true) { std::string line = client.read_until("n"); ++lines_count; client.send(line); } } catch (ConnectionClosed const&) { std::cerr << "Client sent " << lines_count << "linesn"; }
  • 53.
    Counting lines withcallbacks void handle_connection(TCPSocket& client) { int* count = new int(0); client.read_until( "n", std::bind(&handle_read, count)); }
  • 54.
    Counting lines withcallbacks void handle_connection(TCPSocket& client); void handle_read(TCPSocket& c, std::string const& l, Error e, int* count) { if (e) std::cerr << *count << std::endl; else c.send(l, std::bind(&handle_sent, count)); }
  • 55.
    Counting lines withcallbacks void handle_connection(TCPSocket& client); void handle_read(TCPSocket& c, std::string const& l, Error e, int* count); void handle_sent(TCPSocket& client, Error error, int* count) { if (e) std::cerr << *count << std::endl; else client.read_until( "n", std::bind(&handle_read, count)); }
  • 56.
    Callback-based programming consideredharmful • Code is structured with callbacks.
  • 57.
    Callback-based programming consideredharmful • Code is structured with callbacks. • Asynchronous operation break the flow arbitrarily.
  • 58.
    Callback-based programming consideredharmful • Code is structured with callbacks. • Asynchronous operation break the flow arbitrarily. • You lose all syntactic scoping expression (local variables, closure, exceptions, ...).
  • 59.
    Callback-based programming consideredharmful • Code is structured with callbacks. • Asynchronous operation break the flow arbitrarily. • You lose all syntactic scoping expression (local variables, closure, exceptions, ...). • This is not natural. Damn, this is pretty much as bad as GOTO.
  • 60.
    Are we screwed? Threads • Respect your beloved semantic and expressiveness. • Don't scale and introduce race conditions.
  • 61.
    Are we screwed? Threads • Respect your beloved semantic and expressiveness. • Don't scale and introduce race conditions. Callbacks • Scale. • Ruins your semantic. Painful to write, close to impossible to maintain.
  • 62.
    Are we screwed? Threads • Respect your beloved semantic and expressiveness. • Don't scale and introduce race conditions. Callbacks • Scale. • Ruins your semantic. Painful to write, close to impossible to maintain. I lied when I said: we need concurrency without threads.
  • 63.
    Are we screwed? Threads • Respect your beloved semantic and expressiveness. • Don't scale and introduce race conditions. Callbacks • Scale. • Ruins your semantic. Painful to write, close to impossible to maintain. I lied when I said: we need concurrency without threads. We need concurrency without system threads.
  • 64.
    Coroutines Also known as: •green threads • userland threads • fibers • contexts • ...
  • 65.
    Coroutines • Separate executioncontexts like system threads. • Userland: no need to ask the kernel. • Non-parallel. • Cooperative instead of preemptive: they yield to each other.
  • 66.
    Coroutines • Separate executioncontexts like system threads. • Userland: no need to ask the kernel. • Non-parallel. • Cooperative instead of preemptive: they yield to each other. By building on top of that, we have: • Scalability: no system thread involved. • No arbitrary race-conditions: no parallelism. • A stack, a context: the code is natural.
  • 67.
    Coroutines-based scheduler • Makea scheduler that holds coroutines . • Embed a reactor in there. • Write a neat Socket class.
  • 68.
    Coroutines-based scheduler • Makea scheduler that holds coroutines . • Embed a reactor in there. • Write a neat Socket class. When read, it: ◦ Unschedules itself. ◦ Asks the reactor to read ◦ Pass a callback to reschedule itself ◦ Yield control back.
  • 69.
    Coroutines-based echo server TCPServerserver; server.listen(4242); std::vector<Thread> threads; int lines_count = 0; while (true) { TCPSocket client = server.accept(); Thread t([client = std::move(client)] { try { while (true) { ++lines_count; client.send(client.read_until("n")); } } catch (ConnectionClosed const&) {} }); threads.push_back(std::move(t)); }
  • 70.
    What we builtat Infinit: the reactor.
  • 71.
    What we builtat Infinit: the reactor. • Coroutine scheduler: simple round robin • Sleeping, waiting • Timers • Synchronization • Mutexes, semaphores • TCP networking • SSL • UPnP • HTTP client (Curl based)
  • 72.
    Coroutine scheduling reactor::Scheduler sched; reactor::Threadt1(sched, [&] { print("Hello 1"); reactor::yield(); print("Bye 1"); }); reactor::Thread t2(sched, [&] { print("Hello 2"); reactor::yield(); print("Bye 2"); }); ); sched.run();
  • 73.
    Coroutine scheduling reactor::Scheduler sched; reactor::Threadt1(sched, [&] { print("Hello 1"); reactor::yield(); print("Bye 1"); }); reactor::Thread t2(sched, [&] { print("Hello 2"); reactor::yield(); print("Bye 2"); }); ); sched.run(); Hello 1 Hello 2 Bye 1 Bye 2
  • 74.
    Sleeping and waiting reactor::Threadt1(sched, [&] { print("Hello 1"); reactor::sleep(500_ms); print("Bye 1"); }); reactor::Thread t2(sched, [&] { print("Hello 2"); reactor::yield(); print("World 2"); reactor::yield(); print("Bye 2"); }); );
  • 75.
    Sleeping and waiting reactor::Threadt1(sched, [&] { print("Hello 1"); reactor::sleep(500_ms); print("Bye 1"); }); reactor::Thread t2(sched, [&] { print("Hello 2"); reactor::yield(); print("World 2"); reactor::yield(); print("Bye 2"); }); ); Hello 1 Hello 2 World 2 Bye 2
  • 76.
    Sleeping and waiting reactor::Threadt1(sched, [&] { print("Hello 1"); reactor::sleep(500_ms); print("Bye 1"); }); reactor::Thread t2(sched, [&] { print("Hello 2"); reactor::yield(); print("World 2"); reactor::yield(); print("Bye 2"); }); ); Hello 1 Hello 2 World 2 Bye 2 Bye 1
  • 77.
    Sleeping and waiting reactor::Threadt1(sched, [&] { print("Hello 1"); reactor::sleep(500_ms); print("Bye 1"); }); reactor::Thread t2(sched, [&] { print("Hello 2"); reactor::yield(); print("World 2"); reactor::wait(t1); // Wait print("Bye 2"); }); );
  • 78.
    Sleeping and waiting reactor::Threadt1(sched, [&] { print("Hello 1"); reactor::sleep(500_ms); print("Bye 1"); }); reactor::Thread t2(sched, [&] { print("Hello 2"); reactor::yield(); print("World 2"); reactor::wait(t1); // Wait print("Bye 2"); }); ); Hello 1 Hello 2 World 2
  • 79.
    Sleeping and waiting reactor::Threadt1(sched, [&] { print("Hello 1"); reactor::sleep(500_ms); print("Bye 1"); }); reactor::Thread t2(sched, [&] { print("Hello 2"); reactor::yield(); print("World 2"); reactor::wait(t1); // Wait print("Bye 2"); }); ); Hello 1 Hello 2 World 2 Bye 1 Bye 2
  • 80.
    Synchronization: signals reactor::Signal task_available; std::vector<Task>tasks; reactor::Thread handler([&] { while (true) { if (!tasks.empty()) { std::vector mytasks = std::move(tasks); for (auto& task: tasks) ; // Handle task } else reactor::wait(task_available); } });
  • 81.
    Synchronization: signals reactor::Signal task_available; std::vector<Task>tasks; reactor::Thread handler([&] { while (true) { if (!tasks.empty()) { std::vector mytasks = std::move(tasks); for (auto& task: tasks) ; // Handle task } else reactor::wait(task_available); } }); tasks.push_back(...); task_available.signal();
  • 82.
    Synchronization: signals reactor::Signal task_available; std::vector<Task>tasks; reactor::Thread handler([&] { while (true) { if (!tasks.empty()) // 1 { std::vector mytasks = std::move(tasks); for (auto& task: tasks) ; // Handle task } else reactor::wait(task_available); // 4 } }); tasks.push_back(...); // 2 task_available.signal(); // 3
  • 83.
    Synchronization: channels reactor::Channel<Task> tasks; reactor::Threadhandler([&] { while (true) { Task t = tasks.get(); // Handle task } }); tasks.put(...);
  • 84.
    Mutexes But you saidno race conditions! You lied again!
  • 85.
    Mutexes But you saidno race conditions! You lied again! reactor::Thread t([&] { while (true) { for (auto& socket: sockets) socket.send("YO"); } }); { socket.push_back(...); }
  • 86.
    Mutexes But you saidno race conditions! You lied again! reactor::Mutex mutex; reactor::Thread t([&] { while (true) { reactor::wait(mutex); for (auto& socket: sockets) socket.send("YO"); mutex.unlock(); } }); { reactor::wait(mutex); socket.push_back(...); mutex.unlock(); }
  • 87.
    Mutexes But you saidno race conditions! You lied again! reactor::Mutex mutex; reactor::Thread t([&] { while (true) { reactor::Lock lock(mutex); for (auto& socket: sockets) socket.send("YO"); } }); { reactor::Lock lock(mutex); socket.push_back(...); }
  • 88.
    Networking: TCP We sawa good deal of TCP networking: try { reactor::TCPSocket socket("battle.net", 4242, 10_sec); // ... } catch (reactor::network::ResolutionFailure const&) { // ... } catch (reactor::network::Timeout const&) { // ... }
  • 89.
    Networking: TCP We sawa good deal of TCP networking: void serve(TCPSocket& client) { try { std::string auth = server.read_until("n", 10_sec); if (!check_auth(auth)) // Impossible with callbacks throw InvalidCredentials(); while (true) { ... } } catch (reactor::network::Timeout const&) {} }
  • 90.
    Networking: SSL Transparent clienthandshaking: reactor::network::SSLSocket socket("localhost", 4242); socket.write(...);
  • 91.
    Networking: SSL Transparent serverhandshaking: reactor::network::SSLServer server(certificate, key); server.listen(4242); while (true) { auto socket = server.accept(); reactor::Thread([&] { ... }); }
  • 92.
    Networking: SSL Transparent serverhandshaking: SSLSocket SSLServer::accept() { auto socket = this->_tcp_server.accept(); // SSL handshake return socket }
  • 93.
    Networking: SSL Transparent serverhandshaking: reactor::Channel<SSLSocket> _sockets; void SSLServer::_handshake_thread() { while (true) { auto socket = this->_tcp_server.accept(); // SSL handshake this->_sockets.put(socket); } } SSLSocket SSLServer::accept() { return this->_accepted.get; }
  • 94.
    Networking: SSL Transparent serverhandshaking: void SSLServer::_handshake_thread() { while (true) { auto socket = this->_tcp_server.accept(); reactor::Thread t( [&] { // SSL handshake this->_sockets.put(socket); }); } }
  • 95.
    HTTP std::string google =reactor::http::get("google.com");
  • 96.
    HTTP std::string google =reactor::http::get("google.com"); reactor::http::Request r("kissmetrics.com/api", reactor::http::Method::PUT, "application/json", 5_sec); r.write("{ event: "login"}"); reactor::wait(r);
  • 97.
    HTTP std::string google =reactor::http::get("google.com"); reactor::http::Request r("kissmetrics.com/api", reactor::http::Method::PUT, "application/json", 5_sec); r.write("{ event: "login"}"); reactor::wait(r); • Chunking • Cookies • Custom headers • Upload/download progress • ... pretty much anything Curl supports (i.e., everything)
  • 98.
    HTTP streaming std::string content= reactor::http::get( "my-api.infinit.io/transactions"); auto json = json::parse(content);
  • 99.
    HTTP streaming std::string content= reactor::http::get( "my-api.infinit.io/transactions"); auto json = json::parse(content); reactor::http::Request r( "my-api.production.infinit.io/transactions"); assert(r.status() == reactor::http::Status::OK); // JSON is parsed on the fly; auto json = json::parse(r);
  • 100.
    HTTP streaming std::string content= reactor::http::get( "my-api.infinit.io/transactions"); auto json = json::parse(content); reactor::http::Request r( "my-api.production.infinit.io/transactions"); assert(r.status() == reactor::http::Status::OK); // JSON is parsed on the fly; auto json = json::parse(r); reactor::http::Request r( "youtube.com/upload", http::reactor::Method::PUT); std::ifstream input("~/A new hope - BrRIP.mp4"); std::copy(input, r);
  • 101.
    Better concurrency: futures,... std::string transaction_id = reactor::http::put( "my-api.production.infinit.io/transactions"); // Ask the user files to share. reactor::http::post("my-api.infinit.io/transaction/", file_list); std::string s3_token = reactor::http::get( "s3.aws.amazon.com/get_token?key=..."); // Upload files to S3
  • 102.
    Better concurrency: futures,... std::string transaction_id = reactor::http::put( "my-api.production.infinit.io/transactions"); // Ask the user files to share. reactor::http::post("my-api.infinit.io/transaction/", file_list); std::string s3_token = reactor::http::get( "s3.aws.amazon.com/get_token?key=..."); // Upload files to S3 reactor::http::Request transaction( "my-api.production.infinit.io/transactions"); reactor::http::Request s3( "s3.aws.amazon.com/get_token?key=..."); // Ask the user files to share. auto transaction_id = transaction.content(); reactor::http::Request list( "my-api.infinit.io/transaction/", file_list); auto s3_token = transaction.content(); // Upload files to S3
  • 103.
    Version 1 Wait meta Askfiles Wait meta Wait AWS Version 2 Ask files Better concurrency: futures, ... Version 2 Ask files
  • 104.
    How does itperform for us ? • Notification server does perform: ◦ 10k clients per instance ◦ 0.01 load average ◦ 1G resident memory ◦ Cheap monocore 2.5 Ghz (EC2)
  • 105.
    How does itperform for us ? • Notification server does perform: ◦ 10k clients per instance ◦ 0.01 load average ◦ 1G resident memory ◦ Cheap monocore 2.5 Ghz (EC2) • Life is so much better: ◦ Code is easy and pleasant to write and read ◦ Everything is maintainable ◦ Send metrics on login without slowdown? No biggie. ◦ Try connecting to several interfaces and keep the first to respond? No biggie.
  • 106.