Organized by Donating to
R&Devents@criteo.com
criteo.com
Medium.com/criteo-labs
@CriteoEng #NYANconf
Debugging asynchronous scenarios
by Christophe Nasarre
Kevin Gosse
NYAN conference
First case: a service refuses to stop
• Still in running state in Windows Services panel
In production → take a memory snaphot
procdump -ma <pid>
Parallel Stack in Visual Studio
• Yes: VS is able to load a memory dump
• This is a nice way to visually see what is going on
→ We are waiting for ClusterClient.Dispose() to end
In production → take a memory snaphot
procdump -ma <pid>
Which foreground thread is still running?
what ClusterClient.Dispose() is waiting for?
Look at the Code Luke!
ActionBlock internals
Task ProcessMessage(TInput message)
{
...
}
In production → take a memory snaphot
procdump -ma <pid>
Which foreground thread is still running?
what ClusterClient.Dispose() is waiting for?
Look at the Code Luke!
Look for _agent state
Task ContinueWith(Action<Task> nextAction,…)
{
Task task = new ContinuationTaskFromTask<TResult>
(this, nextAction,…);
base.ContinueWithCore(task, …);
return task;
}
ContinueWith internals (1|3)
internal void ContinueWithCore(Task continuationTask, …)
{
TaskContinuation taskContinuation =
new StandardTaskContinuation(continuationTask, …);
…
if (!continuationTask.IsCompleted)
{
// add task to m_continuationObject
if (!AddTaskContinuation(taskContinuation, …))
{
taskContinuation.Run(this, …);
}
}
}
ContinueWith internals (2|3)
ContinueWith internals (3|3)
async Task<long> AAA(CancellationToken token)
{
Stopwatch tick = new Stopwatch();
tick.Start();
await BBB(token);
tick.Stop();
return tick.ElapsedMillisecond;
}
async/await Internals (1|2)
async/await Internals (2|2)
async Task AAA()
{
await BBB();
...
}
async Task BBB()
{
...
}
AsyncMethodBuilderCore+MoveNextRunner
Action
TaskSchedulerAwaitTaskContinuation**
Task (returned by BBB)
AAA State machine
Task (returned by AAA)
In production → take a memory snaphot
procdump -ma <pid>
Which foreground thread is still running?
what ClusterClient.Dispose() is waiting for?
Look at the Code Luke!
Look for _agent state
→ Exception broke the responses ActionBlock
BONUS: more continuations
• A few other continuation scenarios that you may encounter
✓ Task.Delay
✓ Task.WhenAny
✓ Special cases
Why a List<object> as continuation?
Task DoStuffAsync()
{
var task = SendAsync();
task.ContinueWith(t => LogStuff(t));
return task;
}
// user code
await DoStuffAsync();
DoSomethingSynchronously()
Task
m_continuationObject
nullStandardTaskContinuation
List<object>
StandardTaskContinuation
*TaskContinuation
Why a empty List<object> as continuation?
async Task DoStuffAsync()
{
var T1 = Task.Run(…);
var T2 = Task.Run(…);
await Task.WhenAny(T1, T2);
… // T2 ends first
}
T1
m_continuationObject
null
T2
m_continuationObject
null
CompleteOnInvokePromise
CompleteOnInvokePromise
empty List<object>object
Investigation 1 - key takeaways
1. Thread call stacks do not give the full picture
• Even Visual Studio parallel stacks is not enough
2. Require clear understanding of Task internals
• m_continuationObject and state machines
3. Start from the blocked task and follow the reverse references chain
• sosex!refs is your friend
Symptoms: 0% CPU and thread count raises
In production → take a memory snaphot
procdump -ma <pid>
look at call stacks in Visual Studio
In production → take a memory snaphot
procdump -ma <pid>
look at call stacks in Visual Studio
→ what are those tasks (we are waiting for) doing?
In production → take a memory snaphot
procdump -ma <pid>
look at call stacks in Visual Studio
→ what are those tasks (we are waiting for) doing?
look at tasks in WinDBG
→ no deadlock but everything is blocked…
ThreadPool internals
static void ProcessRequest()
{
var task = CallbackAsync();
task.Wait();
}
R
C
ThreadPool internals
ThreadPool
R
R
ThreadPool internals
ThreadPool
R R
R
R
R
R
C C
ThreadPool internals
ThreadPool
R R
R
R
C
C
R
R
ThreadPool internals
ThreadPool
R R
R
R
C
C
R R
R
R
R
R
C C
R
R
R
R
DEMO
Simple ThreadPool starvation code
Thread 1 Thread 2
ThreadPool internals
Global queue Local queue Local queue
Task 1
Task 2
Task 5
Task 4
Task 6Task 3
ThreadPool internals
Global queue Local queue
C
Thread 1
Local queue
C
Thread 2
Local queue
C
Thread 3
R
R
R
R
In production → take a memory snaphot
procdump -ma <pid>
look at call stacks in Visual Studio
→ what are those tasks (we are waiting for) doing?
look at tasks in WinDBG
→ no deadlock but everything is blocked…
→ ThreadPool is starved
Investigation 2 - key takeaways
1. Waiting synchronously on a Task is dangerous
2. ThreadPool scheduling is unfair
3. 0% CPU + increasing thread count = sign of ThreadPool starvation
Conclusion
• Understand the underlying data structures
• Think of causality chains instead of threads call stack
• Visual Studio is your friend
• Parallel Stacks to get the big picture
• WinDBG is your true ally
• Use and abuse of sosex !refs
• You knew that waiting on tasks is bad
• Now you know why
Resources
Criteo blog series
• http://labs.criteo.com/
• https://medium.com/@kevingosse
• https://medium.com/@chnasarre
Debugging extensions
• https://github.com/chrisnas/DebuggingExtensions (aka Grand Son Of Strike)
Contacts
• Kevin Gosse @kookiz
• Christophe Nasarre @chnasarre

NYAN Conference: Debugging asynchronous scenarios in .net

  • 1.
    Organized by Donatingto R&Devents@criteo.com criteo.com Medium.com/criteo-labs @CriteoEng #NYANconf Debugging asynchronous scenarios by Christophe Nasarre Kevin Gosse NYAN conference
  • 2.
    First case: aservice refuses to stop • Still in running state in Windows Services panel
  • 3.
    In production →take a memory snaphot procdump -ma <pid>
  • 4.
    Parallel Stack inVisual Studio • Yes: VS is able to load a memory dump • This is a nice way to visually see what is going on → We are waiting for ClusterClient.Dispose() to end
  • 5.
    In production →take a memory snaphot procdump -ma <pid> Which foreground thread is still running? what ClusterClient.Dispose() is waiting for? Look at the Code Luke!
  • 6.
  • 7.
    In production →take a memory snaphot procdump -ma <pid> Which foreground thread is still running? what ClusterClient.Dispose() is waiting for? Look at the Code Luke! Look for _agent state
  • 8.
    Task ContinueWith(Action<Task> nextAction,…) { Tasktask = new ContinuationTaskFromTask<TResult> (this, nextAction,…); base.ContinueWithCore(task, …); return task; } ContinueWith internals (1|3)
  • 9.
    internal void ContinueWithCore(TaskcontinuationTask, …) { TaskContinuation taskContinuation = new StandardTaskContinuation(continuationTask, …); … if (!continuationTask.IsCompleted) { // add task to m_continuationObject if (!AddTaskContinuation(taskContinuation, …)) { taskContinuation.Run(this, …); } } } ContinueWith internals (2|3)
  • 10.
  • 11.
    async Task<long> AAA(CancellationTokentoken) { Stopwatch tick = new Stopwatch(); tick.Start(); await BBB(token); tick.Stop(); return tick.ElapsedMillisecond; } async/await Internals (1|2)
  • 12.
    async/await Internals (2|2) asyncTask AAA() { await BBB(); ... } async Task BBB() { ... } AsyncMethodBuilderCore+MoveNextRunner Action TaskSchedulerAwaitTaskContinuation** Task (returned by BBB) AAA State machine Task (returned by AAA)
  • 13.
    In production →take a memory snaphot procdump -ma <pid> Which foreground thread is still running? what ClusterClient.Dispose() is waiting for? Look at the Code Luke! Look for _agent state → Exception broke the responses ActionBlock
  • 14.
    BONUS: more continuations •A few other continuation scenarios that you may encounter ✓ Task.Delay ✓ Task.WhenAny ✓ Special cases
  • 15.
    Why a List<object>as continuation? Task DoStuffAsync() { var task = SendAsync(); task.ContinueWith(t => LogStuff(t)); return task; } // user code await DoStuffAsync(); DoSomethingSynchronously() Task m_continuationObject nullStandardTaskContinuation List<object> StandardTaskContinuation *TaskContinuation
  • 16.
    Why a emptyList<object> as continuation? async Task DoStuffAsync() { var T1 = Task.Run(…); var T2 = Task.Run(…); await Task.WhenAny(T1, T2); … // T2 ends first } T1 m_continuationObject null T2 m_continuationObject null CompleteOnInvokePromise CompleteOnInvokePromise empty List<object>object
  • 17.
    Investigation 1 -key takeaways 1. Thread call stacks do not give the full picture • Even Visual Studio parallel stacks is not enough 2. Require clear understanding of Task internals • m_continuationObject and state machines 3. Start from the blocked task and follow the reverse references chain • sosex!refs is your friend
  • 18.
    Symptoms: 0% CPUand thread count raises
  • 19.
    In production →take a memory snaphot procdump -ma <pid> look at call stacks in Visual Studio
  • 20.
    In production →take a memory snaphot procdump -ma <pid> look at call stacks in Visual Studio → what are those tasks (we are waiting for) doing?
  • 21.
    In production →take a memory snaphot procdump -ma <pid> look at call stacks in Visual Studio → what are those tasks (we are waiting for) doing? look at tasks in WinDBG → no deadlock but everything is blocked…
  • 22.
    ThreadPool internals static voidProcessRequest() { var task = CallbackAsync(); task.Wait(); } R C
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
    Thread 1 Thread2 ThreadPool internals Global queue Local queue Local queue Task 1 Task 2 Task 5 Task 4 Task 6Task 3
  • 29.
    ThreadPool internals Global queueLocal queue C Thread 1 Local queue C Thread 2 Local queue C Thread 3 R R R R
  • 30.
    In production →take a memory snaphot procdump -ma <pid> look at call stacks in Visual Studio → what are those tasks (we are waiting for) doing? look at tasks in WinDBG → no deadlock but everything is blocked… → ThreadPool is starved
  • 31.
    Investigation 2 -key takeaways 1. Waiting synchronously on a Task is dangerous 2. ThreadPool scheduling is unfair 3. 0% CPU + increasing thread count = sign of ThreadPool starvation
  • 32.
    Conclusion • Understand theunderlying data structures • Think of causality chains instead of threads call stack • Visual Studio is your friend • Parallel Stacks to get the big picture • WinDBG is your true ally • Use and abuse of sosex !refs • You knew that waiting on tasks is bad • Now you know why
  • 33.
    Resources Criteo blog series •http://labs.criteo.com/ • https://medium.com/@kevingosse • https://medium.com/@chnasarre Debugging extensions • https://github.com/chrisnas/DebuggingExtensions (aka Grand Son Of Strike) Contacts • Kevin Gosse @kookiz • Christophe Nasarre @chnasarre