Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Async await in C++


Published on

Writing concurrent code is becoming more and more important to leverage the parallelism of multicore architectures. The C++11 library introduced futures and promises as a first step towards task-based programming. However, the C++ support of concurrency is still very limited. Other languages, like C# and Python, provide some forms of resumable functions or coroutines and in C#, the async/await pattern enables to write functions that suspend their execution while waiting for a computation or I/O to complete.This talk will describe a proposal for the addition of resumable function and async/await in C++17. We will focus on the implementation of resumable function on Windows, and we'll play with a first prototype of their implementation in the Visual Studio 2015 Preview. Finally, we will see how resumable functions can also be used to implement (lazy) generators, similar to the one provided by "yield" statements in C#.

Published in: Software

Async await in C++

  1. 1. C++ resumable functions paolo severini Criteo 1
  2. 2. Concurrency: why? Herb Sutter quotes: « Welcome to the Parallel Jungle » “The network is just another bus to more compute cores.” « C++ and Beyond 2012 » DON’T STOP! Blocking a thread is bad for scalability. • Client side: we want responsive apps (ex: iOS, Win8) • UIs must be fluid: never block the GUI thread! • In WinRT all API that take > 50 ms are only asynchronous. • Server side: we want to serve the max # or requests/sec • Allocating too many threads is expensive • We don’t want to have threads blocked on I/O. 2
  3. 3. Concurrency: how? Concurrency is a huge topic. • Anthony Williams, C++ Concurrency in Action • Joe Duffy, Concurrent Programming on Windows • Scott Meyers, Effective Modern C++ • Bjarne Stroustrup, The C++ Programming Language (4th Edition) 3
  4. 4. Concurrency: how? Concurrency is not a new problem, there have been solutions for many years (OS threads, coroutines, synchronization objects, …) Now in C++11/14: • std::threads • std::mutex, std::lock_guard<>, std::unique_lock<> • std::condition_variable • std::atomic_xxx, std::atomic<>, std::atomic_thread_fence() • std::future<>, std::shared_future<> , std::atomic_future<> • std::promise<>, • std::packaged_task<> • … • Work for C++17: • Concurrency TS (Nonblocking futures (.then), executors, await) 4
  5. 5. Example: copy file vector<char> readFile(const string& inPath) { ifstream file(inPath, ios::binary | ios::ate); size_t length = (size_t)file.tellg(); vector<char> buffer(length); file.seekg(0, std::ios::beg);[0], length); return buffer; } size_t writeFile(const vector<char>& buffer, const string& outPath) { ofstream file(outPath, ios::binary); file.write(&buffer[0], buffer.size()); return (size_t)file.tellp(); } size_t copyFile(const string& inFile, const string& outFile) { return writeFile(readFile(inFile), outFile); } blocking blocking 5
  6. 6. Threads It’s now possible to write portable multithreaded code in C++. Example: #include <thread> std::thread t([=]{ writeFile(readFile(inFile), outFile); }); ... t.join(); Note: no direct way to transfer back a value. 6
  7. 7. Multithreading is tricky! void copyFile(const string& inPath, const string& outPath) { ifstream infile(inPath, ios::binary | ios::ate); size_t length = (size_t)infile.tellg(); vector<char> buffer(length); infile.seekg(0, std::ios::beg);[0], length); ofstream outfile(outPath, ios::binary); outfile.write(&buffer[0], buffer.size()); } std::thread copyLogFile(const string& outFile) { char inPath[] = "c:templog.txt"; return std::thread(copyFile, inPath, outFile); } ... copyLogFile("c:tempout2.log").join(); Can you spot the bug?
  8. 8. Callbacks and task continuations We setup processing (task) pipelines to handle events: • With callbacks • Example from Win32: BOOL WriteFileEx(HANDLE hFile, LPCVOID lpBuffer, DWORD nNumberOfBytesToWrite, LPOVERLAPPED lpOverlapped, LPOVERLAPPED_COMPLETION_ROUTINE lpCompletionRoutine); Problem: “Spaghetti code” difficult to read and to maintain – callbacks are the modern goto! • With tasks and continuations • Using libraries like TPL or PPL. do A do D A request has arrived! do B do C Send a response 8
  9. 9. Tasks, futures and promises In C++11: • std::future<T>: the result of an async computation (task). It’s a proxy for a value that will become available. • std::promise<T>: provides a means of setting a value which can later be read through an associated std::future<T>. The act of invocation is separated from the act of retrieving the result. future value promise get()/wait() set_value() set_exception() task 2task 1 9 get_future()
  10. 10. std::async std::async(): starts an asynchronous task and returns a future which will eventually hold the task result. Example: size_t async_copyFile(const string& inFile, const string& outFile) { auto fut1 = std::async(readFile, inFile); auto fut2 = std::async([&fut1](const string& path){ return writeFile(fut1.get(), path); }, outFile); ... // do more work ... return fut2.get(); } 10
  11. 11. std::async When and where is the async function executed? • std::async behaves as a task scheduler. The behavior can be configured with an argument. There are two launch modes: • std::launch::async greedy : starts a new thread to run the task. • std::launch::deferred lazy: runs the task synchronously only when the future is waited for. Concurrency != parallelism 11
  12. 12. The problem with async… • The async feature was not designed very well! • Will this function block? void func() { future<int> f = start_async_task(); //... more code that doesn’t call f.get() or f.wait() } // future destructed here... Does it block waiting for the task to complete? It depends!!! • It blocks if the future was created with std::async(l); • It does not block otherwise They need to fix this mess… 12
  13. 13. Broken promises C++11 Futures and promises are still very limited. They are not easily composable. • We can only wait on a future at the time. No support for: • wait_any() • wait_all() • It is not possible to attach continuations. No support for: • then() Composable tasks would allow making the whole architecture non- blocking and event-driven. For more details, see Bartosz Milewski’s article: “Broken promises”. 13
  14. 14. .NET Tasks In .NET we have the TPL, Task Parallel Library • Task<T> is the equivalent of std::future<T>. • Supports composition with Task.WaitAny() and Task. WaitAll(). • Allows to chain sequences of tasks as continuations, with Task.ContinueWith(). Example: static Task<int> CopyFile(string inFile, string outFile) { var tRead = new Task<byte[]>(() => ReadFile(inFile)); var tWrite = tRead.ContinueWith((t) => WriteFile(outFile, t.Result)); tRead.Start(); return tWrite; } tRead tWrite 14
  15. 15. Microsoft PPL PPL (Parallel Patterns Library) is Microsoft implementation of C++ tasks. • PPL Concurrency::task<T> ~= what a std::future<T> should be: • task::wait_any() • task::wait_all() • task::then() • Windows RT provides many async APIs that return task<T> and uses PPL as its model of asynchrony in C++. • C++ REST SDK (Casablanca) • Library for asynchronous C++ code that connects with cloud-based services. • APIs for HTTP, Websocket, JSON, asynchronous streams, … • Designed to be completely asynchronous – uses PPL tasks everywhere. • Supports Windows, Linux, OS X, iOS, and Android! • Also provides a portable implementation of PPL tasks. 15
  16. 16. PPL example #include <pplawait.h> Concurrency::task<size_t> ppl_copyFile(const string& inFile, const string& outFile) { return Concurrency::create_task([inFile]() { return readFile(inFile); }).then([outFile](const vector<char>& buffer) { return writeFile(buffer, outFile); }); } ... auto tCopy = ppl_create_copyFile_task(inPath, outPath); ... tCopy.wait(); 16
  17. 17. C++17: Concurrency TS Next major release • 8 Technical Specifications (TS): • Library fundamentals (optional<>, string_view) • Arrays (runtime-sized arrays, dynarray<>) • Parallelism (Parallel STL library, parallel algorithms) • Concurrency (Nonblocking futures (.then), executors, await) • File System (Portable file system access: Boost.Filesystem) • Networking (IP addresses, URIs, byte ordering) • Concepts Lite (Extensions for template type checking) • Transactional Memory 17
  18. 18. Towards a better std::future Proposal (N3857) for an extended version of futures, with the same semantic of PPL tasks: • future::when_any() • future::when_all() • future::then() • future::unwrap() Example: std::future<size_t> cpp17_copyFile(const string& inFile, const string& outFile) { return std::async([inFile]() { return readFile(inFile); }).then([outFile](const vector<char>& buffer) { return writeFile(buffer, outFile); }); } 18
  19. 19. The problem with tasks Tasks do not work well with iterations or branches. • Let’s say that we want to copy a file asynchronously chunk by chunk. There is no easy way to construct a loop of continuations with tasks. We need to dynamically attach “recursive” task continuations: task<void> repeat() { return create_task(readFileChunkAsync()) .then(writeFileChunkAsync) .then([]({ if (not_completed()) { repeat(); } else { return create_task([]{}); // empty task } }); } 19
  20. 20. Example: copy a file in chunks (blocking) string readFileChunk(ifstream& file, int chunkLength) { ... } void writeFileChunk(ofstream& file, const string& chunk) { ... } void copyFile_ppl_loop_blocking(const string& inFilePath, const string& outFilePath) { ifstream inFile(inFilePath, ios::binary | ios::ate); inFile.seekg(0, inFile.beg); ofstream outFile(outFilePath, ios::binary); string chunk; while (chunk = readFileChunk(inFile, 4096).get(), !chunk.empty()) { writeFileChunk(outFile, chunk).get(); } } 20
  21. 21. Example: copy a file in chunks (PPL) task<shared_ptr<string>> readFileChunk(shared_ptr<ifstream> file, int chunkLength); task<bool> writeFileChunk(shared_ptr<ofstream> file, shared_ptr<string> chunk); task<void> copyFile_repeat(shared_ptr<ifstream> inFile, shared_ptr<ofstream> outFile) { return readFileChunk(inFile, 4096) .then([=](shared_ptr<string> chunk) { return writeFileChunk(outFile, chunk); }) .then([=](bool eof) { if (!eof) { return copyFile_repeat(inFile, outFile); } else { return task_from_result(); } }); } task<void> copyFile_ppl_loop(const string& inFilePath, const string& outFilePath) { auto inFile = make_shared<ifstream>(inFilePath, ios::binary | ios::ate); inFile->seekg(0, inFile->beg); auto outFile = make_shared<ofstream>(outFilePath, ios::binary); return copyFile_repeat(inFile, outFile); } 21
  22. 22. PPL: a Casablanca sample (1/3) From sample ‘SearchFile’: // A convenient helper function to loop asychronously until a condition is met. pplx::task<bool> _do_while_iteration(std::function<pplx::task<bool>(void)> func) { pplx::task_completion_event<bool> ev; func().then([=](bool guard) { ev.set(guard); }); return pplx::create_task(ev); } pplx::task<bool> _do_while_impl(std::function<pplx::task<bool>(void)> func) { return _do_while_iteration(func).then([=](bool guard) -> pplx::task<bool> { if (guard) { return ::_do_while_impl(func); } else { return pplx::task_from_result(false); } }); } pplx::task<void> do_while(std::function<pplx::task<bool>(void)> func) { return _do_while_impl(func).then([](bool){}); } 22
  23. 23. // Function to create in data from a file and search for a given string writing all lines containing the string to memory_buffer. static pplx::task<void> find_matches_in_file(const string_t &fileName, const std::string &searchString, basic_ostream<char> results) { return file_stream<char>::open_istream(fileName).then([=](basic_istream<char> inFile) { auto lineNumber = std::make_shared<int>(1); return ::do_while([=]() { container_buffer<std::string> inLine; return inFile.read_line(inLine).then([=](size_t bytesRead) { if (bytesRead == 0 && inFile.is_eof()) { return pplx::task_from_result(false); } else if (inLine.collection().find(searchString) != std::string::npos) { results.print("line "); results.print((*lineNumber)++); PPL: a Casablanca sample (2/3) (continues…) 23
  24. 24. else if(inLine.collection().find(searchString) != std::string::npos) { results.print("line "); results.print((*lineNumber)++); return results.print(":").then([=](size_t) { container_buffer<std::string> outLine( std::move(inLine.collection())); return results.write(outLine, outLine.collection().size()); }).then([=](size_t) { return results.print("rn"); }).then([=](size_t) { return true; }); } else { ++(*lineNumber); return pplx::task_from_result(true); } }); }).then([=]() { // Close the file and results stream. return inFile.close() && results.close(); }); }); } PPL: a Casablanca sample (3/3) 24
  25. 25. C# async-await C#5.0 solution: async/await make the code “look” synchronous! Example: copy a file chunk by chunk static private async Task CopyChunk(Stream input, Stream output) { byte[] buffer = new byte[4096]; int bytesRead; while ((bytesRead = await input.ReadAsync(buffer, 0, buffer.Length)) != 0) { await output.WriteAsync(buffer, 0, bytesRead); } } static public async Task CopyFile(string inFile, string outFile) { using (StreamReader sr = new StreamReader(inFile)) { using (StreamWriter sw = new StreamWriter(outFile)) { await CopyChunk(sr.BaseStream, sw.BaseStream); } } } 25
  26. 26. async/await in C# What happens actually when we await? • The functions pause and resume • The compiler transforms an ‘async’ function into a class that implements a state machine • All local variables become data members of the class • On ‘await’ the code attaches a continuation to the invoked task • When the invoked task completes, the continuation is called and the state machine resume • In which thread? It depends on the current SynchronizationContext (either the same thread or any thread from the pool). All this gives the impression that the function pauses and resumes. 26
  27. 27. C# iterators (generators) • Async/await is not the only example of resumable functions in C# • C# iterator blocks contain the yield statement and return lazily a sequence of values: IEnumerable<int> Fib(int max) { int a = 0; int b = 1; while (a <= max) { yield return a; int next = a + b; a = b; b = next; } } • Are implemented with a state machine: the function Fib() is compiled as a class that implements enumerator interfaces. • On MoveNext() the state machine resumes from the last suspension point, and executes until the next yield statement. All this gives the impression that the function pauses and resumes. 27
  28. 28. LINQ (to objects) • Based on C# generators • Declarative “query language” that operates on lazily generated sequences Example: a (slow!) generator of prime numbers: var primes = Enumerable.Range(2, max) .Where(i => Enumerable.Range(2, i - 2) .All(j => i%j != 0)); 28
  29. 29. Resumable functions (Coroutines) • Coroutines: generalization of routines, allow suspending and resuming execution at certain “suspension points”, preserving the execution context. • Boost libraries: Boost.Coroutine, Boost.Context • Use Posix API on Linux • Use Win32 Fibers on Windows • Quite fast (coroutine switch 50-80 CPU cycles) • Fibers: lightweight threads of execution. OS-managed coroutines. • Added to Windows NT 4.0 to support cooperative multitasking. • SwitchToFiber() yields the execution to another fiber. 29
  30. 30. Resumable functions in C++17 ? • Lots of proposals for resumable functions in C++17!!! • Current proposal: N4134 (Nishanov, Radigan: MSFT) • Two new keywords: await, yield • First prototype available in Visual Studio 2015 Preview Examples: future<size_t> copyFile(const string& inFile, const string& outFile) { string s = await readFileAsync(inFile); return await writeFileAsync(s, outFile); } generator<int> fib(int max) { int a = 0; int b = 1; while (a <= max) { yield a; int next = a + b; a = b; b = next; } } 30
  31. 31. Casablanca sample with resumable functions static future<void> find_matches_in_file(const string_t fileName, const std::string searchString, basic_ostream<char> results) { basic_istream<char> inFile = await file_stream<char>::open_istream(fileName); int lineNumber = 1; size_t bytesRead = 0; do { container_buffer<std::string> inLine; bytesRead = await inFile.read_line(inLine); if (inLine.collection().find(searchString) != std::string::npos) { await results.print("line "); await results.print(lineNumber++); await results.print(":"); container_buffer<std::string> outLine(std::move(inLine.collection())); await results.write(outLine, outLine.collection().size()); await results.print("rn"); } else { lineNumber++; } } while (bytesRead > 0 || !inFile.is_eof()); await inFile.close(); await results.close(); } 31
  32. 32. task<void> copyFile(string inFilePath, string outFilePath) { auto inFile = make_shared<ifstream>(inFilePath, ios::binary | ios::ate); inFile->seekg(0, inFile->beg); auto outFile = make_shared<ofstream>(outFilePath, ios::binary); while (true) { string s = await readFileChunk(inFile, 4096); if (s.empty()) { break; } await writeFileChunk(outFile, s); } } 32 Copy a file in chunks with resumable functions
  33. 33. C++ resumable functions: how? Several possible implementations: • with a state machine (like C#) • with stackful coroutines (Fibers) • with stackless coroutines • Resumable functions require changes both to: • the language • New statements (async, await) • Compiler “transforms” resumable functions to support suspension and resumption • the library • Requires “improved futures” (like PPL tasks) • Need more advanced schedulers (executors). 33
  34. 34. Stackful coroutines Visual Studio 2014 CTP had a first implementation, based on Win32 Fibers • An async function has its own resumable side stack, separated by the “real thread” stack. • The side stack lives beyond the suspension points until logical completion. • Problem: not really scalable! • Each fiber reserves a stack (1MB by default), so a few thousand coroutines will exhaust all VM (in 32 bit). • Context switches are relatively expensive. 34
  35. 35. Stackful coroutines future<int> bar() { ... } future<void> foo() { await bar(); suspends foo(), resumes bar() } thread fiber foo context Call stack thread fiber bar context Call stack fiber foo context Call stack 35
  36. 36. Stackless coroutines (N4134) 36 task<int> bar() { ... } task<R> foo(T1 a, T2 b) { // function body await bar(); // suspension point (suspends foo, resumes bar) } compiled as: task<R> foo(T1 a, T2 b) { auto rh = new resumable_handle<R, T1, T2>(a, b); (*rh)(); } The resumable_handle is a compiler-generated class that contains the state of a coroutine and implements the call operator (): void operator () { // function body task<int> t = bar(); // suspension point if (!await_ready(t)) { await_suspend(t); } _result = await_resume(t); } _promise a b local vars resumable_handle (foo)
  37. 37. Stackless coroutines (N4134) • Awaitable types are types for which a library provides the support for await statements, by implementing: bool await_ready(T& t) const void await_suspend(T& t, resumable_handle rh) void await_resume(T& t) • For example, for PPL tasks, the awaitable functions are: bool await_ready(task& t) const { return t.is_done(); } void await_suspend(task& t, resumable_handle rh) { t.then([rh](task&){ rh(); }; } void await_resume(task& t) { t.get(); } 37
  38. 38. Stackless coroutines (N4134) • The coroutine_promise class is a library class that implements the semantics of a particular type of coroutine (ex: await, generators, …). It must implement functions like: void set_result(T val) T get_return_object(resumable_handle rh) void set_exception(E e) AwaitType yield_value(T val) ... • Functions with await statements => compiler generates code that allocates a resumable_handle and uses library code for coroutine- promises and awaitable type to implement the logic of a suspension/resumption point. Note: this is still just a proposal! (unlikely to be standardized in C++17). 38
  39. 39. Generators (N4134) Generator: resumable function that provides a sequence of values (“yielded” lazily, only when the next element is requested) Example: Fibonacci sequence generator<int> fib(int max) { int a = 0; int b = 1; while (a <= max) { yield a; int next = a + b; a = b; b = next; } } for (auto n in fib(1000)) { std::cout << n << std::endl; } • The library defines a generator<T> class, with a special kind of coroutine promised designed to support the yield statement. • Generators behave as ranges (provide input iterators). • Ideally, should be composable with LINQ-like operators and interoperable with Niebler’s ranges. 39 _promise resumable_handle (fib) generator generator:: iterator begin() end() operator ++ () operator == ()
  40. 40. Generators (N4134) template<typename T> class generator { class iterator { resumable_handle _coro; iterator(resumable_handle rh) : _coro(rh) {} iterator(nullptr_t) {} // represents the end of the sequence iterator operator ++ () { _coro(); // resume execution return *this; } bool operator == (const iterator& rhs) { return _coro == rhs._coro; } const T& operator * () const { return _coro.promise()._CurrentValue; } }; resumable_handle _coro; generator(resumable_handle rh) : _coro(rh) {} iterator begin() { _coro(); // starts the coroutine and executes it until it terminates or yields. return iterator(_coro); } iterator end() { return iterator(nullptr); } }; 40
  41. 41. In conclusion… • Don’t block: write asynchronous code! • C++11 futures are a good start, but there is still work to do. • Meanwhile, use libraries (like PPL, not only on Windows). • Continuation-style code is complicated! We need help from the compiler and the libraries (ex: async/await). • Let’s hope the await proposal will be approved! • We will soon be able to write simple, elegant asynchronous code in C++!  41
  42. 42. Questions? ?