NDC Sydney 2019 conference in Sydney, AU - 2019/10/15
Talk: War stories from .NET team by Karel Zikmund
https://sessionize.com/s/karel-zikmund/async-demystified/24175
https://www.youtube.com/watch?v=TgUYcZV-foM
4. APM pattern
IAsyncResult BeginFoo(..., AsyncCallback callback, object state);
void EndFoo(IAsyncResult iar);
Synchronous call:
Foo();
Achieving the same:
EndFoo(BeginFoo(..., null, null));
Taking advantage of the asynchronous nature:
BeginFoo(..., iar => {
T val = EndFoo(iar);
// do stuff ...
});
5. APM – Example
Copy stream to stream:
int bytesRead;
while ((bytesRead = input.Read(buffer)) != 0) {
output.Write(buffer, 0 /* offset */, bytesRead);
}
6. APM – Nesting problem
BeginRead(..., iar => {
int bytesRead = EndRead(iar);
input.BeginWrite(..., iar2 => {
int bytesWritten2 = EndWrite(iar2);
BeginRead(..., iar3 => {
int bytesRead3 = EndRead(iar3);
BeginWrite(..., iar4 => {
// ... again and again
});
});
});
});
7. APM – IsCompletedSynchronously
IAsyncResult r = BeginRead(..., iar => {
if (!iar.IsCompletedSynchronously) {
// ... asynchronous path as shown earlier
}
});
if (r.IsCompletedSynchronously) {
// ... Synchronous path
}
• Even worse in loop
• Overall very complicated
• Queueing on ThreadPool much simpler
8. EAP:
Event-based Asynchronous Pattern
• .NET Framework 2.0
obj.Completed += (sender, eventArgs) => {
// ... my event handler
}
obj.SendPacket(); // returns void
• Did not solve multiple-calls problem, or loops
• Introduced ExecutionContext
9. Task
• .NET Framework 4.0
• MSR project – parallel computing
• Divide & conquer efficiently (e.g. QuickSort)
• Shaped Task – similar to today
• Task – represents general work (compute, I/O bound, etc.)
= promise / future / other terminology
• Task / Task<T> – operation (with optional result T) contains:
1. T … in the case of Task<T>
2. State related to synchronization
3. State related to callback
10. Task / TaskCompletionSource
• Task
• Here's a callback, invoke it when you're done, or right now if you've already
completed
• I want to block here, until your work is done
• Cannot be completed by user directly
• TaskCompletionSource … wrapper for Task
• Holds Task internally and operates on it via internal methods
• Methods:
• SetResult
• SetException
• SetCancelled
11. Task – Consumption
Task<T> t;
Either:
t.Wait(); // Blocks until Task is completed
Or:
t.ContinueWith(callback); // Will be executed after Task is completed
Even multiple times:
t.ContinueWith(callback2);
t.ContinueWith(callback3);
ContinueWith:
• Does not guarantee order of executions
• Always asynchronous (queued to ThreadPool/scheduler in general)
12. Task.Run
We complicated things
Task<T> Task.Run(delegate d)
• Adds field to Task with ‘d’
• Queues work to ThreadPool
• Thread grabs it, executes it, marks task completed
13. Task.Run implementation
Task<T> Run(Func<T> f) {
var tcs = new TaskCompletionSource<T>();
ThreadPool.QueueUserWorkItem(() => {
try {
T result = f();
tcs.SetResult(result);
} catch (ex) {
tcs.SetException(ex);
}
});
return tcs.Task;
}
14. async-await
.NET Framework 4.5 / C# 5
Example of asynchronous code:
Task<int> GetDataAsync();
Task PutDataAsync(int i);
Code:
Task<int> t = GetDataAsync();
t.ContinueWith(a => {
var t2 = PutDataAsync(a.Result);
t2.ContinueWith(b => Console.WriteLine("done"));
});
15. async-await
Task<int> t = GetDataAsync();
t.ContinueWith(a => {
var t2 = PutDataAsync(a.Result);
t2.ContinueWith(b => Console.WriteLine("done"));
});
C# 5 with async-await helps us:
Task<int> t = GetDataAsync();
int aResult = await t;
Task t2 = PutDataAsync(aResult);
await t2;
Console.WriteLine("done");
16. Awaiter pattern
int aResult = await t;
Translated to:
var $awaiter1 = t.GetAwaiter();
if (! $awaiter1.IsCompleted) { // returns bool
// ... (complicated) ...
}
int aResult = $awaiter1.GetResult(); // returns void or T
// If exception, it will throw it
17. Awaiter pattern – details
void MoveNext() {
if (__state == 0) goto label0;
if (__state == 1) goto label1;
if (__state == 42) goto label42;
if (! $awaiter1.IsCompleted) {
__state = 42;
$awaiter1.OnCompleted(MoveNext);
return;
}
label42:
int aResult = $awaiter1.GetResult();
}
18. State Machine
string x = Console.ReadLine();
int aResult = await t;
Console.WriteLine("done" + x);
State machine:
struct MethodFooStateMachine {
void MoveNext() { ... }
local1; // would be ‘x’ in example above
local2;
params;
_$awaiter1;
__state;
}
19. State Machine – Example
public async Task Foo(int timeout) {
await Task.Delay(timeout);
}
Compiler generates:
public Task Foo(int timeout) {
FooStateMachine sm = default;
sm._timeout = timeout;
sm._state = 0;
sm.MoveNext();
return ???;
}
struct FooStateMachine {
int _timeout; // param
// locals would be here too
void MoveNext() { ... }
int __state;
TaskAwaiter _$awaiter1;
}
20. State Machine – Example
public Task Foo(int timeout) {
FooStateMachine sm = default;
sm._tcs = new TaskCompletionSource();
sm._timeout = timeout;
sm._state = 0;
sm.MoveNext();
return sm._tcs.Task;
}
Builder pattern (can return struct):
AsyncValueTaskMethodBuilder.Create();
_tcs.Task -> _builder.Task;
struct FooStateMachine {
int _timeout; // param
// locals would be here too
void MoveNext() {
// ...
_tcs.SetResult(...);
}
int _state;
TaskAwaiter _$awaiter1;
TaskCompletionSource _tcs;
}
21. State Machine – Perf improvements
What about Task allocation?
• Builder can reuse known tasks
• Task.CompletedTask (without value)
• boolean – True/False
• int … <-1,8>
• LastCompleted (e.g. on MemoryStream)
• Does not work on SslStream (alternates headers and body)
• Size: 64B (no value) / 72B (with value)
• Azure workloads OK (GC will collect)
• Hot-path: up to 5%-10% via more GCs
22. ValueTask
• .NET Core 2.0
• Also as nuget package down-level
struct ValueTask<T> {
T;
Task<T>;
}
• Only one of them: T+null or default+Task<T>
• NET Core 2.1
ValueTask<int> Stream.ReadAsync(Memory<byte>, ...)
23. ValueTask – Can we improve more?
• What about the 1% asynchronous case?
• .NET Core 2.1
struct ValueTask<T> {
T;
Task<T>;
IValueTaskSource<T>;
}
struct ValueTask {
Task;
IValueTaskSource;
}
24. Summary
• APM pattern = Asynchronous Programming Model
• .NET Framework 1.0/1.1 (2002-2003)
• IAsyncResult, BeginFoo/EndFoo
• Limited nesting / loops
• EAP = Event-based Asynchronous Pattern (.NET Framework 2.0)
• Events – similar problems as APM
• Task (.NET Framework 4.0)
• Wait / ContinueWith
• TaskCompletionSource (for control/writing)
• async-await (.NET Framework 4.5 / C# 5)
• Awaiter pattern, state machine
• ValueTask (.NET Core 2.0)
• Don’t use unless you are on hot-path
• Hyper-optimizations possible, stay away, it is dangerous!
@ziki_cz
Editor's Notes
Internal talk
Warning: Some people find the 1st part as a bit boring recap, some like it
IAsyncResult
AsyncWaitHandle – ManualResetEvent or AutoResetEvent
Across BCL
Usage of the APIs either:
Wait for callback to be called, or
Call EndFoo which will block until completed
END: Single operation works fine, but in reality you do more – e.g. in a loop
E.g. Network to disk
END: Manually it does not work – somehow turn it into loop
It’s possible but extremely long and tricky
Further complications with IsCompletedSynchronously (as perf optimization)… next slide
For perf reasons – handle synchronous completions right away
At the end:
Imagine you did the loop with this as before … on MemoryStream, the data is already available -> instead of ThreadPool, call delegate immediately
-> Leads to recursive calls -> ~10K StackOverflow
Bottom part:
Even BCL has lots of wrappers (e.g. in Networking: LazyAsyncResult) with lots of specializations
Very complicated
Straightforward idea – Completed event
Kick off operation, then Completed handler is invoked (generally on ThreadPool)
END:
ExecutionContext – basically a ThreadLocal, which survived until much later and until today in some form
BCL: Used in 5-10 classes in BCL … like SmtpMail, TcpClient, BackgroundWorker
Downsides:
We shipped it in .NET Framework 2.0 and quickly realized that it is interesting experiment, but not exactly addressing real needs
#2 -- “MSR Project”
90% right, 10% keeps Toub awake at night even after 10 years and would love to change it
#3 – “Task”
NOT tied to ThreadPool – not tied to executing delegate
Shove result into it
Can be completed
Can wake up someone waiting on it
#1 – Task
Task – something to consume - hook up to
#1.3 – Cannot be completed - not to change directly (no control)
#2:
TaskCompletionSource – can alter state of Task … has control over Task
Example: Lazy initialization (something over network) … you are in charge who can change “work completed”
You don’t want others to decide when it is done
Wait - creates ManualResetEvent which will be signaled when one of the SetResult/SetException/SetCancelled is called
END:
Option: TaskExecutionOption to do it synchronously
APM (IAsyncResult) … no shared implementation, everyone had to have their own implementation
Task model - you don't pass delegate at creation, but you can walk up on any of them and say "call me when you're done"
Enabled abstractions – like async-await
await hooks up the callback … we will look into it later
Convenient method – but mixes up things
ties it to ThreadPool (breaks abstraction)
adds delegate to Task
END:
Sets completed = execute callback, waking up things waiting on it, etc.
#1 – TaskCompletionSource creates Task
#3 – return - returns Task to be awaited on, etc.
Now we implemented Task.Run without storing any delegate on Task
#2 - GetData/PutData … maybe across the network
END:
Compiler translates it to the code above (hand-waving involved)
Note: Compiler does not treat Task specially, but it just looks for pattern (awaiter pattern)
#3 – GetResult
Bold methods are pattern matching
#4 – complicated comment
Simplified version, things are in fact more complicated
“! IsCompleted” part is complicated – I have to hook up code that comes back here when task completes
#0 – Let’s look deeper into the “! IsCompleted”
#1 – All of it is part of MoveNext method -- it is a state machine, every await in method is state in state machine (hand waving a bit)
END:
OnCompleted has slightly more complicated signature
#1 – code
How does ‘x’ survive? (it is just on stack) – need to capture it
Same in lambda – C# compiler lifts local to keep it on heap allocated object (floats through closures)
Same in state machine
END:
Compiler optimizes – stores here things only crossing await boundary
Note that in debug -- struct is class -- for debuggability, but for perf struct
Why struct? These async methods often complete synchronously – example:
BufferedStream … large buffer behind with inner stream
If I ask for 1B, but it reads 10K in bulk, then lots of calls finish synchronously
If it was class, then we would allocate per call
END:
We will talk about return ??? (a Task) in more details on next slide
Let’s expand the generated code a bit – Foo and FooStateMachine
#1 – _tcs
_tcs (producers) is logically on state machine (a bit more complicated) … reminder: it has Task inside
#2 – new TaskCompletionSource
Allocated per call / operation
Problem #1: 2 allocations – TaskCompletionSource and Task (inside)
For the synchronous case we want 0 allocations ideally (BufferedStream example)
Problem #2: Even the Task/TaskCompletionSource is problematic, because it is anything Task-like (compiler does not want to hardcore Task) … builder pattern instead of constructor new TSC()
Each Task-like type (except Task) has attribute defining builder - builder pattern
ValueTask (details later) has one -> AsyncValueTaskMethodBuilder
instead of new TaskCompletionSource() -> AsyncValueTaskMethodBuilder.Create();
We have internally in System.Runtime.CompilerServices structs:
AsyncTaskMethodBuilder, AsyncTaskMethodBuilder<T>, AsyncVoidMethodBuilder
#5 -- Task
instead of _tcs.Task -> _builder.Task
We eliminated TaskCompletionSource allocation – builder can return struct (ValueTask)
END: What about the Task?
CompletedTask – void return value – just info it finished vs. not
END: How can we improve the perf even more?
#2 “Only one of them”
Methods are 1-liners (if Task<T> == null, do something, else something else)
Nicely handles synchronously completing case
Note: Non-generic ValueTask does not make sense - only Task inside … we have CompletedTask
Problem: Overloading on return type
#3 – .NET Core 2.1
Luckily we introduced Memory<T> at the same time, as we cannot overload on return type
That's why sometimes in PRs we wrap byte[] in Memory first … to use the ValueTask
Design-guidelines: Start with Task … use ValueTask only in hot-path scenarios
IValueTaskSource – complicated interface
almost the awaiter pattern:
Are you completed?
Hook up a call back
Get a result
All implementations on ValueTask are now ternary
Value: You can implement it however you want, incl. reset (reuse)
Complicated to do, so not everywhere
Socket.SendAsync/ReceiveAsync
Object per send/receive … but one at a time (typical)
0 allocation for loop around Send/Receive on Socket
Same: NetworkStream, Pipelines, Channels