SlideShare a Scribd company logo
1 of 25
War stories
from .NET team
.NET Core Summer event 2019 – Brno, CZ
Karel Zikmund – @ziki_cz
Agenda
• Stories
• Investigations on .NET team
• Not just from me
• Lessons learned on the way
You won’t see any:
• Source code
• Debugger
Not needed: Deep .NET knowledge
Not on agenda
My First Serious Investigation
• Build lab for Windows component
• Build break 1x per week
• AccessViolation dialog hangs machine
• Toolset updated to 2.0 RTM
• Repro:
• Once in ~50 runs
• Overnight run: 247 crashes out of 77,006 runs (0.3%)
My First Serious Investigation - quotes
• "The actual crash is occurring on some boilerplate stack checking
code …“
• “Karel is relatively new to the code base so he indicated it might take
some time to understand what’s going on”
mscorwks!UTSemReadWrite::UnlockRead+0xe [f:rtmndpclrsrcutilcodeutsem.cpp @ 357]
mscorwks!CMDSemReadWrite::~CMDSemReadWrite+0x14 [f:rtm...mdencrwutil.cpp @ 1299]
mscorwks!RegMeta::DefineParam+0x196 [f:rtmndpclrsrcmdcompileremit.cpp @ 2719]
cscomp!EMITTER::EmitParamProp
cscomp!ParamAttrBind::Init
cscomp!ParamAttrBind::CompileParamList
cscomp!CLSDREC::compileMethod
cscomp!CLSDREC::CompileMember
cscomp!CLSDREC::EnumMembersInEmitOrder
cscomp!CLSDREC::compileAggregate
cscomp!CLSDREC::compileNamespace
cscomp!COMPILER::CompileAll
cscomp!COMPILER::Compile
cscomp!CController::RunCompiler
cscomp!CController::Compile
csc!main
My First Serious Investigation
My First Serious Investigation
• Who corrupts stack?
• GC?
• NO!
• Changed value between caller and callee
• Single bit changed
• Who corrupts it?
• GC card table updates?
• Of course NOT!
• What about HW?
• Naw!
• Or maybe?
My First Serious Investigation
• Does it by a chance reproduce on only one machine?
• Answer: How did you know?
• But why always the same callstack?
• Good question, no good answer … magic
• Lesson learned: Debugging HW errors is costly and hard
• Always ask: Does it repro on more than 1 machine?
Another MetaData story
MetaData format background:
• Basically database – rows and columns
• Example – TypeDef table:
• Indexes into tables/heaps are either 2B or 4B
• What happens if last TypeDef has no methods?
• MethodList = Number of methods + 1 = max + 1
• What happens if there is 0xffff methods?
Flags TypeName TypeNamespace Extends MethodList
(Public) “Foo” “Awesome.Story” … Method #10
(Private) “Bar” “Awesome.Story” … Method #11
Another MetaData story
• II.24.2.6 “#~ stream”
• If e is a simple index into a table with index i, it is stored using 2 bytes if table i has less than
2^16 rows, otherwise it is stored using 4 bytes.
• II.22.37 TypeDef : 0x02
• 21. If MethodList is non-null, it shall index a valid row in the MethodDef table, where valid
means 1 <= row <= rowcount+1 [ERROR]
• How do you fix it?
• “I’m on the fence whether we should (fix it), given it looks like people hit this about once in 17
years”
• https://github.com/dotnet/corefx/issues/29554
• Lesson learned: Not all bugs have to be fixed
TypeSystem – Collapsing interfaces
• Table of implemented interfaces:
class A : I, J {}
• With generics:
class C<T> : L<T> {}
class D<T> : C<T>, L<string> {}
class E : D<string>, I {}
0 1
I J
0 1 2
I J K
0 1
L<T> L<string>
0 1
L<string> I
0 1 2
L<string> L<string> I
class B : A, K {}
Fix:
Breaking changes – Intro
• Everyone wants fix for their bug
• But nobody wants to be broken
• Observation: 10% of fixes have unintended side-effects
• Extreme case: Perf improvement can break app
• How many customers?
• Lesson learned: Everything has risk of breaking someone
Breaking changes – Last build
• Finance app crashing – “last” build of Windows 8 on arm (Surface RT)
• Latent bug (introduced months ago)
• Bug triggered by:
1. Method in NGen image has to be across 8KB pages
2. GC has to be triggered at least twice when it’s on stack
• Unrelated change caused “unlucky” method order for:
• System.Net.Configuration.DefaultProxySectionInternal..ctor
• Lesson learned: Anything, really ANYTHING, has risk of breaking
Breaking changes – Huge impact
• Patch to .NET Framework broke certain tax SW
• Printing tax forms
• Update pushed few days before tax deadline in US
• Note: Printing was tested on both sides (Microsoft & tax SW
company)
• But only into file, not to printer
• Lessons learned: Be extra cautious around sensitive dates
Breaking changes – Below you
• RavenDB – blue screen after KB4487017 on .NET Core!
• dotnet/coreclr#22597
• PrefetchVirtualMemory
• Kernel memory
management bug
Networking – Security issue
• January: Researcher running ML models on Cosmos
• Suspicion about buffers – more logging
• March: Repro gone
• May: Similar report
• +2 weeks: It blows up (more teams & impact)
• All hands on-deck
• Small repro (20 min, then 1 min) … yay!
• TTD trace (iDNA / TTT) … bonus & life saver
Networking – Security issue
• Root-cause: HTTP pipelining under stress
• 13 years old bug (.NET 2.0)
Response 1
Request 1
Server
Response 1
Request 1
Server
Request 2
Response 2
Networking – Security issue
Request 1
Server
Request 2Request 3
Response 1Response 2
Networking – Security issue
Request 1
Server
Request 2Request 3
Response 1Response 2
Networking – Security issue
• We have workaround (disable pipelining) – perf impact
• Worked fix …
• Verifying fix …
• Repro fails after 4h 
• Same symptoms
• Repro sensitive to cloud network load (8-17)
• TTD (iDNA / TTT) does not work 
• Suspicion about buffers again
Networking – Security issue
• Bad buffer lifetime management – on sending side!
• 5 years old bug (.NET 4.5.2)
• Trigger found:
• Thanks to Skype team – 24h deployment of experiments
• Change in .NET 4.7.1
• Fix around the problematic area
• Making the opportunity window SMALLER!
• … counter-intuitive
• Code review – similar bug on receiving side (5 years old)
• Same symptoms as HTTP pipelining
Networking – Security issue
• Why so many customers/services hit it at once?
• Maybe Spectre & Meltdown fixes roll out?
• or just … magic
• Lesson learned: Weird coincidences can happen …
Developer’s pride in multi-threading
• School project (2000-2003)
• Game simulation server – heavily multi-threaded
• https://github.com/karelz/WarPlusPlus (nostalgia)
• Classic deadlock – 2 threads locking A and B in different order
• Deadlock avoidance started make sense
• WinRT binder (2010)
• Binder is tricky – GC interaction (NO_GC range)
• Type routed to WinMD file, assembly meaningless
• Negotiated on namespace only in 1 assembly
• Multiple reviews, discussions with architects
• Bugs start to come in after shipping (NullReferenceException)
Optimizations
• Once upon a time, … there was a service in Microsoft
• List vs. array data structure perf
• Perspectives:
1. The data structure will have in practice 3-5 items
2. There 3 hops between servers for each request!!!
• Lesson learned: Avoid premature optimizations … at all cost
Lessons learned
• Always ask: Does it repro on more than 1 machine?
• Debugging HW bugs is costly
• Some bugs happen once in 17 years
• Spec bugs are hard to fix
• MetaData format bug
• Anything, really ANYTHING, has risk of breaking someone
• Innocent changes can trigger latent bugs elsewhere
• Impact may be huge – e.g. during tax season
• Always try to create small repro
• Make your and everyone’s life easier
• TTD (iDNA / TTT) is life saver
• Avoid premature optimizations … at all cost, save your time
• … sometimes there is just … magic
@ziki_cz
Thank you
• Feedback welcome
• Twitter DM, email, in-person, etc.
• Survey
• What you liked vs. not?
• Too rushed?
• Hard to understand?
• Boring?
• Didn’t meet your expectations?
@ziki_cz

More Related Content

What's hot

0day hunting a.k.a. The story of a proper CPE test
0day hunting a.k.a. The story of a proper CPE test0day hunting a.k.a. The story of a proper CPE test
0day hunting a.k.a. The story of a proper CPE testBalazs Bucsay
 
CNIT 126 2: Malware Analysis in Virtual Machines & 3: Basic Dynamic Analysis
CNIT 126 2: Malware Analysis in Virtual Machines & 3: Basic Dynamic AnalysisCNIT 126 2: Malware Analysis in Virtual Machines & 3: Basic Dynamic Analysis
CNIT 126 2: Malware Analysis in Virtual Machines & 3: Basic Dynamic AnalysisSam Bowne
 
CNIT 127 Ch 16: Fault Injection and 17: The Art of Fuzzing
CNIT 127 Ch 16: Fault Injection and 17: The Art of FuzzingCNIT 127 Ch 16: Fault Injection and 17: The Art of Fuzzing
CNIT 127 Ch 16: Fault Injection and 17: The Art of FuzzingSam Bowne
 
CNIT 152: 6. Scope & 7. Live Data Collection
CNIT 152: 6. Scope & 7. Live Data CollectionCNIT 152: 6. Scope & 7. Live Data Collection
CNIT 152: 6. Scope & 7. Live Data CollectionSam Bowne
 
CNIT 126 11. Malware Behavior
CNIT 126 11. Malware BehaviorCNIT 126 11. Malware Behavior
CNIT 126 11. Malware BehaviorSam Bowne
 
Case Study of the Unexplained
Case Study of the UnexplainedCase Study of the Unexplained
Case Study of the Unexplainedshannomc
 
Sans london april sans at night - tearing apart a fileless malware sample
Sans london april   sans at night - tearing apart a fileless malware sampleSans london april   sans at night - tearing apart a fileless malware sample
Sans london april sans at night - tearing apart a fileless malware sampleMichel Coene
 
3. Security Engineering
3. Security Engineering3. Security Engineering
3. Security EngineeringSam Bowne
 
CNIT 126: 10: Kernel Debugging with WinDbg
CNIT 126: 10: Kernel Debugging with WinDbgCNIT 126: 10: Kernel Debugging with WinDbg
CNIT 126: 10: Kernel Debugging with WinDbgSam Bowne
 
Esage on non-existent 0-days, stable binary exploits and user interaction
Esage   on non-existent 0-days, stable binary exploits and user interactionEsage   on non-existent 0-days, stable binary exploits and user interaction
Esage on non-existent 0-days, stable binary exploits and user interactionDefconRussia
 
Browser Fuzzing with a Twist (and a Shake) -- ZeroNights 2015
Browser Fuzzing with a Twist (and a Shake) -- ZeroNights 2015Browser Fuzzing with a Twist (and a Shake) -- ZeroNights 2015
Browser Fuzzing with a Twist (and a Shake) -- ZeroNights 2015Jeremy Brown
 
Steelcon 2014 - Process Injection with Python
Steelcon 2014 - Process Injection with PythonSteelcon 2014 - Process Injection with Python
Steelcon 2014 - Process Injection with Pythoninfodox
 
AV Evasion with the Veil Framework
AV Evasion with the Veil FrameworkAV Evasion with the Veil Framework
AV Evasion with the Veil FrameworkVeilFramework
 
Practical Malware Analysis: Ch 10: Kernel Debugging with WinDbg
Practical Malware Analysis: Ch 10: Kernel Debugging with WinDbgPractical Malware Analysis: Ch 10: Kernel Debugging with WinDbg
Practical Malware Analysis: Ch 10: Kernel Debugging with WinDbgSam Bowne
 
Practical Malware Analysis: Ch 11: Malware Behavior
Practical Malware Analysis: Ch 11: Malware BehaviorPractical Malware Analysis: Ch 11: Malware Behavior
Practical Malware Analysis: Ch 11: Malware BehaviorSam Bowne
 
Case study: formal verification of the Brain Fuck Scheduler
Case study: formal verification of the Brain Fuck SchedulerCase study: formal verification of the Brain Fuck Scheduler
Case study: formal verification of the Brain Fuck SchedulerMengxuan Xia
 
CNIT 152 12. Investigating Windows Systems (Part 3)
CNIT 152 12. Investigating Windows Systems (Part 3)CNIT 152 12. Investigating Windows Systems (Part 3)
CNIT 152 12. Investigating Windows Systems (Part 3)Sam Bowne
 
From zero to SYSTEM on full disk encrypted windows system
From zero to SYSTEM on full disk encrypted windows systemFrom zero to SYSTEM on full disk encrypted windows system
From zero to SYSTEM on full disk encrypted windows systemNabeel Ahmed
 

What's hot (20)

0day hunting a.k.a. The story of a proper CPE test
0day hunting a.k.a. The story of a proper CPE test0day hunting a.k.a. The story of a proper CPE test
0day hunting a.k.a. The story of a proper CPE test
 
CNIT 126 2: Malware Analysis in Virtual Machines & 3: Basic Dynamic Analysis
CNIT 126 2: Malware Analysis in Virtual Machines & 3: Basic Dynamic AnalysisCNIT 126 2: Malware Analysis in Virtual Machines & 3: Basic Dynamic Analysis
CNIT 126 2: Malware Analysis in Virtual Machines & 3: Basic Dynamic Analysis
 
CNIT 127 Ch 16: Fault Injection and 17: The Art of Fuzzing
CNIT 127 Ch 16: Fault Injection and 17: The Art of FuzzingCNIT 127 Ch 16: Fault Injection and 17: The Art of Fuzzing
CNIT 127 Ch 16: Fault Injection and 17: The Art of Fuzzing
 
CNIT 152: 6. Scope & 7. Live Data Collection
CNIT 152: 6. Scope & 7. Live Data CollectionCNIT 152: 6. Scope & 7. Live Data Collection
CNIT 152: 6. Scope & 7. Live Data Collection
 
CNIT 126 11. Malware Behavior
CNIT 126 11. Malware BehaviorCNIT 126 11. Malware Behavior
CNIT 126 11. Malware Behavior
 
Case Study of the Unexplained
Case Study of the UnexplainedCase Study of the Unexplained
Case Study of the Unexplained
 
Sans london april sans at night - tearing apart a fileless malware sample
Sans london april   sans at night - tearing apart a fileless malware sampleSans london april   sans at night - tearing apart a fileless malware sample
Sans london april sans at night - tearing apart a fileless malware sample
 
3. Security Engineering
3. Security Engineering3. Security Engineering
3. Security Engineering
 
CNIT 126: 10: Kernel Debugging with WinDbg
CNIT 126: 10: Kernel Debugging with WinDbgCNIT 126: 10: Kernel Debugging with WinDbg
CNIT 126: 10: Kernel Debugging with WinDbg
 
Esage on non-existent 0-days, stable binary exploits and user interaction
Esage   on non-existent 0-days, stable binary exploits and user interactionEsage   on non-existent 0-days, stable binary exploits and user interaction
Esage on non-existent 0-days, stable binary exploits and user interaction
 
Browser Fuzzing with a Twist (and a Shake) -- ZeroNights 2015
Browser Fuzzing with a Twist (and a Shake) -- ZeroNights 2015Browser Fuzzing with a Twist (and a Shake) -- ZeroNights 2015
Browser Fuzzing with a Twist (and a Shake) -- ZeroNights 2015
 
Steelcon 2014 - Process Injection with Python
Steelcon 2014 - Process Injection with PythonSteelcon 2014 - Process Injection with Python
Steelcon 2014 - Process Injection with Python
 
AV Evasion with the Veil Framework
AV Evasion with the Veil FrameworkAV Evasion with the Veil Framework
AV Evasion with the Veil Framework
 
9: OllyDbg
9: OllyDbg9: OllyDbg
9: OllyDbg
 
Practical Malware Analysis: Ch 10: Kernel Debugging with WinDbg
Practical Malware Analysis: Ch 10: Kernel Debugging with WinDbgPractical Malware Analysis: Ch 10: Kernel Debugging with WinDbg
Practical Malware Analysis: Ch 10: Kernel Debugging with WinDbg
 
Practical Malware Analysis: Ch 11: Malware Behavior
Practical Malware Analysis: Ch 11: Malware BehaviorPractical Malware Analysis: Ch 11: Malware Behavior
Practical Malware Analysis: Ch 11: Malware Behavior
 
Case study: formal verification of the Brain Fuck Scheduler
Case study: formal verification of the Brain Fuck SchedulerCase study: formal verification of the Brain Fuck Scheduler
Case study: formal verification of the Brain Fuck Scheduler
 
Spo2 t19 spo2-t19
Spo2 t19 spo2-t19Spo2 t19 spo2-t19
Spo2 t19 spo2-t19
 
CNIT 152 12. Investigating Windows Systems (Part 3)
CNIT 152 12. Investigating Windows Systems (Part 3)CNIT 152 12. Investigating Windows Systems (Part 3)
CNIT 152 12. Investigating Windows Systems (Part 3)
 
From zero to SYSTEM on full disk encrypted windows system
From zero to SYSTEM on full disk encrypted windows systemFrom zero to SYSTEM on full disk encrypted windows system
From zero to SYSTEM on full disk encrypted windows system
 

Similar to .NET Core Summer event 2019 in Brno, CZ - War stories from .NET team -- Karel Zikmund

Fixing twitter
Fixing twitterFixing twitter
Fixing twitterRoger Xia
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...smallerror
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...xlight
 
Patching Windows Executables with the Backdoor Factory | DerbyCon 2013
Patching Windows Executables with the Backdoor Factory | DerbyCon 2013Patching Windows Executables with the Backdoor Factory | DerbyCon 2013
Patching Windows Executables with the Backdoor Factory | DerbyCon 2013midnite_runr
 
Vulnerability Inheritance in ICS (English)
Vulnerability Inheritance in ICS (English)Vulnerability Inheritance in ICS (English)
Vulnerability Inheritance in ICS (English)Digital Bond
 
John adams talk cloudy
John adams   talk cloudyJohn adams   talk cloudy
John adams talk cloudyJohn Adams
 
Chirp 2010: Scaling Twitter
Chirp 2010: Scaling TwitterChirp 2010: Scaling Twitter
Chirp 2010: Scaling TwitterJohn Adams
 
DotNext 2017 in Moscow - .NET Core Networking stack and Performance -- Karel ...
DotNext 2017 in Moscow - .NET Core Networking stack and Performance -- Karel ...DotNext 2017 in Moscow - .NET Core Networking stack and Performance -- Karel ...
DotNext 2017 in Moscow - .NET Core Networking stack and Performance -- Karel ...Karel Zikmund
 
Mike Bartley - Innovations for Testing Parallel Software - EuroSTAR 2012
Mike Bartley - Innovations for Testing Parallel Software - EuroSTAR 2012Mike Bartley - Innovations for Testing Parallel Software - EuroSTAR 2012
Mike Bartley - Innovations for Testing Parallel Software - EuroSTAR 2012TEST Huddle
 
Silicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Silicon Valley Code Camp 2015 - Advanced MongoDB - The SequelSilicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Silicon Valley Code Camp 2015 - Advanced MongoDB - The SequelDaniel Coupal
 
Security research over Windows #defcon china
Security research over Windows #defcon chinaSecurity research over Windows #defcon china
Security research over Windows #defcon chinaPeter Hlavaty
 
NDC London 2020 - Challenges of Managing CoreFx Repo -- Karel Zikmund
NDC London 2020 - Challenges of Managing CoreFx Repo -- Karel ZikmundNDC London 2020 - Challenges of Managing CoreFx Repo -- Karel Zikmund
NDC London 2020 - Challenges of Managing CoreFx Repo -- Karel ZikmundKarel Zikmund
 
.NET Core Summer event 2019 in Brno, CZ - .NET Core Networking stack and perf...
.NET Core Summer event 2019 in Brno, CZ - .NET Core Networking stack and perf....NET Core Summer event 2019 in Brno, CZ - .NET Core Networking stack and perf...
.NET Core Summer event 2019 in Brno, CZ - .NET Core Networking stack and perf...Karel Zikmund
 
Using ~300 Billion DNS Queries to Analyse the TLD Name Collision Problem
Using ~300 Billion DNS Queries to Analyse the TLD Name Collision ProblemUsing ~300 Billion DNS Queries to Analyse the TLD Name Collision Problem
Using ~300 Billion DNS Queries to Analyse the TLD Name Collision ProblemAPNIC
 
There's no magic... until you talk about databases
 There's no magic... until you talk about databases There's no magic... until you talk about databases
There's no magic... until you talk about databasesESUG
 
Matrix, The Year To Date, Ben Parsons, TADSummit 2018
Matrix, The Year To Date, Ben Parsons, TADSummit 2018Matrix, The Year To Date, Ben Parsons, TADSummit 2018
Matrix, The Year To Date, Ben Parsons, TADSummit 2018Alan Quayle
 
ShaREing Is Caring
ShaREing Is CaringShaREing Is Caring
ShaREing Is Caringsporst
 

Similar to .NET Core Summer event 2019 in Brno, CZ - War stories from .NET team -- Karel Zikmund (20)

Fixing twitter
Fixing twitterFixing twitter
Fixing twitter
 
Fixing_Twitter
Fixing_TwitterFixing_Twitter
Fixing_Twitter
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
 
Patching Windows Executables with the Backdoor Factory | DerbyCon 2013
Patching Windows Executables with the Backdoor Factory | DerbyCon 2013Patching Windows Executables with the Backdoor Factory | DerbyCon 2013
Patching Windows Executables with the Backdoor Factory | DerbyCon 2013
 
Vulnerability Inheritance in ICS (English)
Vulnerability Inheritance in ICS (English)Vulnerability Inheritance in ICS (English)
Vulnerability Inheritance in ICS (English)
 
Surge2012
Surge2012Surge2012
Surge2012
 
Introduction to multicore .ppt
Introduction to multicore .pptIntroduction to multicore .ppt
Introduction to multicore .ppt
 
John adams talk cloudy
John adams   talk cloudyJohn adams   talk cloudy
John adams talk cloudy
 
Chirp 2010: Scaling Twitter
Chirp 2010: Scaling TwitterChirp 2010: Scaling Twitter
Chirp 2010: Scaling Twitter
 
DotNext 2017 in Moscow - .NET Core Networking stack and Performance -- Karel ...
DotNext 2017 in Moscow - .NET Core Networking stack and Performance -- Karel ...DotNext 2017 in Moscow - .NET Core Networking stack and Performance -- Karel ...
DotNext 2017 in Moscow - .NET Core Networking stack and Performance -- Karel ...
 
Mike Bartley - Innovations for Testing Parallel Software - EuroSTAR 2012
Mike Bartley - Innovations for Testing Parallel Software - EuroSTAR 2012Mike Bartley - Innovations for Testing Parallel Software - EuroSTAR 2012
Mike Bartley - Innovations for Testing Parallel Software - EuroSTAR 2012
 
Silicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Silicon Valley Code Camp 2015 - Advanced MongoDB - The SequelSilicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Silicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
 
Security research over Windows #defcon china
Security research over Windows #defcon chinaSecurity research over Windows #defcon china
Security research over Windows #defcon china
 
NDC London 2020 - Challenges of Managing CoreFx Repo -- Karel Zikmund
NDC London 2020 - Challenges of Managing CoreFx Repo -- Karel ZikmundNDC London 2020 - Challenges of Managing CoreFx Repo -- Karel Zikmund
NDC London 2020 - Challenges of Managing CoreFx Repo -- Karel Zikmund
 
.NET Core Summer event 2019 in Brno, CZ - .NET Core Networking stack and perf...
.NET Core Summer event 2019 in Brno, CZ - .NET Core Networking stack and perf....NET Core Summer event 2019 in Brno, CZ - .NET Core Networking stack and perf...
.NET Core Summer event 2019 in Brno, CZ - .NET Core Networking stack and perf...
 
Using ~300 Billion DNS Queries to Analyse the TLD Name Collision Problem
Using ~300 Billion DNS Queries to Analyse the TLD Name Collision ProblemUsing ~300 Billion DNS Queries to Analyse the TLD Name Collision Problem
Using ~300 Billion DNS Queries to Analyse the TLD Name Collision Problem
 
There's no magic... until you talk about databases
 There's no magic... until you talk about databases There's no magic... until you talk about databases
There's no magic... until you talk about databases
 
Matrix, The Year To Date, Ben Parsons, TADSummit 2018
Matrix, The Year To Date, Ben Parsons, TADSummit 2018Matrix, The Year To Date, Ben Parsons, TADSummit 2018
Matrix, The Year To Date, Ben Parsons, TADSummit 2018
 
ShaREing Is Caring
ShaREing Is CaringShaREing Is Caring
ShaREing Is Caring
 

More from Karel Zikmund

.NET Conf 2022 - Networking in .NET 7
.NET Conf 2022 - Networking in .NET 7.NET Conf 2022 - Networking in .NET 7
.NET Conf 2022 - Networking in .NET 7Karel Zikmund
 
NDC Sydney 2019 - Async Demystified -- Karel Zikmund
NDC Sydney 2019 - Async Demystified -- Karel ZikmundNDC Sydney 2019 - Async Demystified -- Karel Zikmund
NDC Sydney 2019 - Async Demystified -- Karel ZikmundKarel Zikmund
 
WUG Days 2022 Brno - Networking in .NET 7.0 and YARP -- Karel Zikmund
WUG Days 2022 Brno - Networking in .NET 7.0 and YARP -- Karel ZikmundWUG Days 2022 Brno - Networking in .NET 7.0 and YARP -- Karel Zikmund
WUG Days 2022 Brno - Networking in .NET 7.0 and YARP -- Karel ZikmundKarel Zikmund
 
.NET Core Summer event 2019 in Vienna, AT - .NET 5 - Future of .NET on Mobile...
.NET Core Summer event 2019 in Vienna, AT - .NET 5 - Future of .NET on Mobile....NET Core Summer event 2019 in Vienna, AT - .NET 5 - Future of .NET on Mobile...
.NET Core Summer event 2019 in Vienna, AT - .NET 5 - Future of .NET on Mobile...Karel Zikmund
 
.NET Core Summer event 2019 in Brno, CZ - Async demystified -- Karel Zikmund
.NET Core Summer event 2019 in Brno, CZ - Async demystified -- Karel Zikmund.NET Core Summer event 2019 in Brno, CZ - Async demystified -- Karel Zikmund
.NET Core Summer event 2019 in Brno, CZ - Async demystified -- Karel ZikmundKarel Zikmund
 
DotNext 2017 in Moscow - Challenges of Managing CoreFX repo -- Karel Zikmund
DotNext 2017 in Moscow - Challenges of Managing CoreFX repo -- Karel ZikmundDotNext 2017 in Moscow - Challenges of Managing CoreFX repo -- Karel Zikmund
DotNext 2017 in Moscow - Challenges of Managing CoreFX repo -- Karel ZikmundKarel Zikmund
 
.NET MeetUp Brno 2017 - Microsoft Engineering teams in Europe -- Karel Zikmund
.NET MeetUp Brno 2017 - Microsoft Engineering teams in Europe -- Karel Zikmund.NET MeetUp Brno 2017 - Microsoft Engineering teams in Europe -- Karel Zikmund
.NET MeetUp Brno 2017 - Microsoft Engineering teams in Europe -- Karel ZikmundKarel Zikmund
 
.NET MeetUp Brno 2017 - Xamarin .NET internals -- Marek Safar
.NET MeetUp Brno 2017 - Xamarin .NET internals -- Marek Safar.NET MeetUp Brno 2017 - Xamarin .NET internals -- Marek Safar
.NET MeetUp Brno 2017 - Xamarin .NET internals -- Marek SafarKarel Zikmund
 
.NET MeetUp Brno - Challenges of Managing CoreFX repo -- Karel Zikmund
.NET MeetUp Brno - Challenges of Managing CoreFX repo -- Karel Zikmund.NET MeetUp Brno - Challenges of Managing CoreFX repo -- Karel Zikmund
.NET MeetUp Brno - Challenges of Managing CoreFX repo -- Karel ZikmundKarel Zikmund
 
.NET Fringe 2017 - Challenges of Managing CoreFX repo -- Karel Zikmund
.NET Fringe 2017 - Challenges of Managing CoreFX repo -- Karel Zikmund.NET Fringe 2017 - Challenges of Managing CoreFX repo -- Karel Zikmund
.NET Fringe 2017 - Challenges of Managing CoreFX repo -- Karel ZikmundKarel Zikmund
 
.NET MeetUp Prague 2017 - Challenges of Managing CoreFX repo -- Karel Zikmund
.NET MeetUp Prague 2017 - Challenges of Managing CoreFX repo -- Karel Zikmund.NET MeetUp Prague 2017 - Challenges of Managing CoreFX repo -- Karel Zikmund
.NET MeetUp Prague 2017 - Challenges of Managing CoreFX repo -- Karel ZikmundKarel Zikmund
 
.NET MeetUp Prague 2017 - .NET Standard -- Karel Zikmund
.NET MeetUp Prague 2017 - .NET Standard -- Karel Zikmund.NET MeetUp Prague 2017 - .NET Standard -- Karel Zikmund
.NET MeetUp Prague 2017 - .NET Standard -- Karel ZikmundKarel Zikmund
 
.NET MeetUp Amsterdam 2017 - Challenges of Managing CoreFX repo -- Karel Zikmund
.NET MeetUp Amsterdam 2017 - Challenges of Managing CoreFX repo -- Karel Zikmund.NET MeetUp Amsterdam 2017 - Challenges of Managing CoreFX repo -- Karel Zikmund
.NET MeetUp Amsterdam 2017 - Challenges of Managing CoreFX repo -- Karel ZikmundKarel Zikmund
 
.NET MeetUp Amsterdam 2017 - .NET Standard -- Karel Zikmund
.NET MeetUp Amsterdam 2017 - .NET Standard -- Karel Zikmund.NET MeetUp Amsterdam 2017 - .NET Standard -- Karel Zikmund
.NET MeetUp Amsterdam 2017 - .NET Standard -- Karel ZikmundKarel Zikmund
 

More from Karel Zikmund (14)

.NET Conf 2022 - Networking in .NET 7
.NET Conf 2022 - Networking in .NET 7.NET Conf 2022 - Networking in .NET 7
.NET Conf 2022 - Networking in .NET 7
 
NDC Sydney 2019 - Async Demystified -- Karel Zikmund
NDC Sydney 2019 - Async Demystified -- Karel ZikmundNDC Sydney 2019 - Async Demystified -- Karel Zikmund
NDC Sydney 2019 - Async Demystified -- Karel Zikmund
 
WUG Days 2022 Brno - Networking in .NET 7.0 and YARP -- Karel Zikmund
WUG Days 2022 Brno - Networking in .NET 7.0 and YARP -- Karel ZikmundWUG Days 2022 Brno - Networking in .NET 7.0 and YARP -- Karel Zikmund
WUG Days 2022 Brno - Networking in .NET 7.0 and YARP -- Karel Zikmund
 
.NET Core Summer event 2019 in Vienna, AT - .NET 5 - Future of .NET on Mobile...
.NET Core Summer event 2019 in Vienna, AT - .NET 5 - Future of .NET on Mobile....NET Core Summer event 2019 in Vienna, AT - .NET 5 - Future of .NET on Mobile...
.NET Core Summer event 2019 in Vienna, AT - .NET 5 - Future of .NET on Mobile...
 
.NET Core Summer event 2019 in Brno, CZ - Async demystified -- Karel Zikmund
.NET Core Summer event 2019 in Brno, CZ - Async demystified -- Karel Zikmund.NET Core Summer event 2019 in Brno, CZ - Async demystified -- Karel Zikmund
.NET Core Summer event 2019 in Brno, CZ - Async demystified -- Karel Zikmund
 
DotNext 2017 in Moscow - Challenges of Managing CoreFX repo -- Karel Zikmund
DotNext 2017 in Moscow - Challenges of Managing CoreFX repo -- Karel ZikmundDotNext 2017 in Moscow - Challenges of Managing CoreFX repo -- Karel Zikmund
DotNext 2017 in Moscow - Challenges of Managing CoreFX repo -- Karel Zikmund
 
.NET MeetUp Brno 2017 - Microsoft Engineering teams in Europe -- Karel Zikmund
.NET MeetUp Brno 2017 - Microsoft Engineering teams in Europe -- Karel Zikmund.NET MeetUp Brno 2017 - Microsoft Engineering teams in Europe -- Karel Zikmund
.NET MeetUp Brno 2017 - Microsoft Engineering teams in Europe -- Karel Zikmund
 
.NET MeetUp Brno 2017 - Xamarin .NET internals -- Marek Safar
.NET MeetUp Brno 2017 - Xamarin .NET internals -- Marek Safar.NET MeetUp Brno 2017 - Xamarin .NET internals -- Marek Safar
.NET MeetUp Brno 2017 - Xamarin .NET internals -- Marek Safar
 
.NET MeetUp Brno - Challenges of Managing CoreFX repo -- Karel Zikmund
.NET MeetUp Brno - Challenges of Managing CoreFX repo -- Karel Zikmund.NET MeetUp Brno - Challenges of Managing CoreFX repo -- Karel Zikmund
.NET MeetUp Brno - Challenges of Managing CoreFX repo -- Karel Zikmund
 
.NET Fringe 2017 - Challenges of Managing CoreFX repo -- Karel Zikmund
.NET Fringe 2017 - Challenges of Managing CoreFX repo -- Karel Zikmund.NET Fringe 2017 - Challenges of Managing CoreFX repo -- Karel Zikmund
.NET Fringe 2017 - Challenges of Managing CoreFX repo -- Karel Zikmund
 
.NET MeetUp Prague 2017 - Challenges of Managing CoreFX repo -- Karel Zikmund
.NET MeetUp Prague 2017 - Challenges of Managing CoreFX repo -- Karel Zikmund.NET MeetUp Prague 2017 - Challenges of Managing CoreFX repo -- Karel Zikmund
.NET MeetUp Prague 2017 - Challenges of Managing CoreFX repo -- Karel Zikmund
 
.NET MeetUp Prague 2017 - .NET Standard -- Karel Zikmund
.NET MeetUp Prague 2017 - .NET Standard -- Karel Zikmund.NET MeetUp Prague 2017 - .NET Standard -- Karel Zikmund
.NET MeetUp Prague 2017 - .NET Standard -- Karel Zikmund
 
.NET MeetUp Amsterdam 2017 - Challenges of Managing CoreFX repo -- Karel Zikmund
.NET MeetUp Amsterdam 2017 - Challenges of Managing CoreFX repo -- Karel Zikmund.NET MeetUp Amsterdam 2017 - Challenges of Managing CoreFX repo -- Karel Zikmund
.NET MeetUp Amsterdam 2017 - Challenges of Managing CoreFX repo -- Karel Zikmund
 
.NET MeetUp Amsterdam 2017 - .NET Standard -- Karel Zikmund
.NET MeetUp Amsterdam 2017 - .NET Standard -- Karel Zikmund.NET MeetUp Amsterdam 2017 - .NET Standard -- Karel Zikmund
.NET MeetUp Amsterdam 2017 - .NET Standard -- Karel Zikmund
 

Recently uploaded

EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfPower Karaoke
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
XpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsXpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsMehedi Hasan Shohan
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...aditisharan08
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number SystemsJheuzeDellosa
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 

Recently uploaded (20)

EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdf
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
XpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsXpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software Solutions
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number Systems
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 

.NET Core Summer event 2019 in Brno, CZ - War stories from .NET team -- Karel Zikmund

  • 1. War stories from .NET team .NET Core Summer event 2019 – Brno, CZ Karel Zikmund – @ziki_cz
  • 2. Agenda • Stories • Investigations on .NET team • Not just from me • Lessons learned on the way You won’t see any: • Source code • Debugger Not needed: Deep .NET knowledge Not on agenda
  • 3. My First Serious Investigation • Build lab for Windows component • Build break 1x per week • AccessViolation dialog hangs machine • Toolset updated to 2.0 RTM • Repro: • Once in ~50 runs • Overnight run: 247 crashes out of 77,006 runs (0.3%)
  • 4. My First Serious Investigation - quotes • "The actual crash is occurring on some boilerplate stack checking code …“ • “Karel is relatively new to the code base so he indicated it might take some time to understand what’s going on”
  • 5. mscorwks!UTSemReadWrite::UnlockRead+0xe [f:rtmndpclrsrcutilcodeutsem.cpp @ 357] mscorwks!CMDSemReadWrite::~CMDSemReadWrite+0x14 [f:rtm...mdencrwutil.cpp @ 1299] mscorwks!RegMeta::DefineParam+0x196 [f:rtmndpclrsrcmdcompileremit.cpp @ 2719] cscomp!EMITTER::EmitParamProp cscomp!ParamAttrBind::Init cscomp!ParamAttrBind::CompileParamList cscomp!CLSDREC::compileMethod cscomp!CLSDREC::CompileMember cscomp!CLSDREC::EnumMembersInEmitOrder cscomp!CLSDREC::compileAggregate cscomp!CLSDREC::compileNamespace cscomp!COMPILER::CompileAll cscomp!COMPILER::Compile cscomp!CController::RunCompiler cscomp!CController::Compile csc!main My First Serious Investigation
  • 6. My First Serious Investigation • Who corrupts stack? • GC? • NO! • Changed value between caller and callee • Single bit changed • Who corrupts it? • GC card table updates? • Of course NOT! • What about HW? • Naw! • Or maybe?
  • 7. My First Serious Investigation • Does it by a chance reproduce on only one machine? • Answer: How did you know? • But why always the same callstack? • Good question, no good answer … magic • Lesson learned: Debugging HW errors is costly and hard • Always ask: Does it repro on more than 1 machine?
  • 8. Another MetaData story MetaData format background: • Basically database – rows and columns • Example – TypeDef table: • Indexes into tables/heaps are either 2B or 4B • What happens if last TypeDef has no methods? • MethodList = Number of methods + 1 = max + 1 • What happens if there is 0xffff methods? Flags TypeName TypeNamespace Extends MethodList (Public) “Foo” “Awesome.Story” … Method #10 (Private) “Bar” “Awesome.Story” … Method #11
  • 9. Another MetaData story • II.24.2.6 “#~ stream” • If e is a simple index into a table with index i, it is stored using 2 bytes if table i has less than 2^16 rows, otherwise it is stored using 4 bytes. • II.22.37 TypeDef : 0x02 • 21. If MethodList is non-null, it shall index a valid row in the MethodDef table, where valid means 1 <= row <= rowcount+1 [ERROR] • How do you fix it? • “I’m on the fence whether we should (fix it), given it looks like people hit this about once in 17 years” • https://github.com/dotnet/corefx/issues/29554 • Lesson learned: Not all bugs have to be fixed
  • 10. TypeSystem – Collapsing interfaces • Table of implemented interfaces: class A : I, J {} • With generics: class C<T> : L<T> {} class D<T> : C<T>, L<string> {} class E : D<string>, I {} 0 1 I J 0 1 2 I J K 0 1 L<T> L<string> 0 1 L<string> I 0 1 2 L<string> L<string> I class B : A, K {} Fix:
  • 11. Breaking changes – Intro • Everyone wants fix for their bug • But nobody wants to be broken • Observation: 10% of fixes have unintended side-effects • Extreme case: Perf improvement can break app • How many customers? • Lesson learned: Everything has risk of breaking someone
  • 12. Breaking changes – Last build • Finance app crashing – “last” build of Windows 8 on arm (Surface RT) • Latent bug (introduced months ago) • Bug triggered by: 1. Method in NGen image has to be across 8KB pages 2. GC has to be triggered at least twice when it’s on stack • Unrelated change caused “unlucky” method order for: • System.Net.Configuration.DefaultProxySectionInternal..ctor • Lesson learned: Anything, really ANYTHING, has risk of breaking
  • 13. Breaking changes – Huge impact • Patch to .NET Framework broke certain tax SW • Printing tax forms • Update pushed few days before tax deadline in US • Note: Printing was tested on both sides (Microsoft & tax SW company) • But only into file, not to printer • Lessons learned: Be extra cautious around sensitive dates
  • 14. Breaking changes – Below you • RavenDB – blue screen after KB4487017 on .NET Core! • dotnet/coreclr#22597 • PrefetchVirtualMemory • Kernel memory management bug
  • 15. Networking – Security issue • January: Researcher running ML models on Cosmos • Suspicion about buffers – more logging • March: Repro gone • May: Similar report • +2 weeks: It blows up (more teams & impact) • All hands on-deck • Small repro (20 min, then 1 min) … yay! • TTD trace (iDNA / TTT) … bonus & life saver
  • 16. Networking – Security issue • Root-cause: HTTP pipelining under stress • 13 years old bug (.NET 2.0) Response 1 Request 1 Server Response 1 Request 1 Server Request 2 Response 2
  • 17. Networking – Security issue Request 1 Server Request 2Request 3 Response 1Response 2
  • 18. Networking – Security issue Request 1 Server Request 2Request 3 Response 1Response 2
  • 19. Networking – Security issue • We have workaround (disable pipelining) – perf impact • Worked fix … • Verifying fix … • Repro fails after 4h  • Same symptoms • Repro sensitive to cloud network load (8-17) • TTD (iDNA / TTT) does not work  • Suspicion about buffers again
  • 20. Networking – Security issue • Bad buffer lifetime management – on sending side! • 5 years old bug (.NET 4.5.2) • Trigger found: • Thanks to Skype team – 24h deployment of experiments • Change in .NET 4.7.1 • Fix around the problematic area • Making the opportunity window SMALLER! • … counter-intuitive • Code review – similar bug on receiving side (5 years old) • Same symptoms as HTTP pipelining
  • 21. Networking – Security issue • Why so many customers/services hit it at once? • Maybe Spectre & Meltdown fixes roll out? • or just … magic • Lesson learned: Weird coincidences can happen …
  • 22. Developer’s pride in multi-threading • School project (2000-2003) • Game simulation server – heavily multi-threaded • https://github.com/karelz/WarPlusPlus (nostalgia) • Classic deadlock – 2 threads locking A and B in different order • Deadlock avoidance started make sense • WinRT binder (2010) • Binder is tricky – GC interaction (NO_GC range) • Type routed to WinMD file, assembly meaningless • Negotiated on namespace only in 1 assembly • Multiple reviews, discussions with architects • Bugs start to come in after shipping (NullReferenceException)
  • 23. Optimizations • Once upon a time, … there was a service in Microsoft • List vs. array data structure perf • Perspectives: 1. The data structure will have in practice 3-5 items 2. There 3 hops between servers for each request!!! • Lesson learned: Avoid premature optimizations … at all cost
  • 24. Lessons learned • Always ask: Does it repro on more than 1 machine? • Debugging HW bugs is costly • Some bugs happen once in 17 years • Spec bugs are hard to fix • MetaData format bug • Anything, really ANYTHING, has risk of breaking someone • Innocent changes can trigger latent bugs elsewhere • Impact may be huge – e.g. during tax season • Always try to create small repro • Make your and everyone’s life easier • TTD (iDNA / TTT) is life saver • Avoid premature optimizations … at all cost, save your time • … sometimes there is just … magic @ziki_cz
  • 25. Thank you • Feedback welcome • Twitter DM, email, in-person, etc. • Survey • What you liked vs. not? • Too rushed? • Hard to understand? • Boring? • Didn’t meet your expectations? @ziki_cz

Editor's Notes

  1. Quickly about me: .NET team for almost 14 years Started as junior / out of college on Runtime – C++, pieces like Metadata, TypeSystem, Assembly Loader Later on moved to manager role Then moved to BCL (Base Class Libraries) – Networking area mainly (HttpClient) … working in open-source (.NET Core) Community manager of dotnet/corefx repo
  2. Lessons learned – maybe useful to you Maybe just helps you understand what is happening on the other side / below you I already had few people confirm they hit some/all situations Were able to identify with problems and recommendations
  3. 2006 January – 3 months in MS Large code base, dozens of machines, productivity impact on larger team Crash – “hang dialog” with AV msbuild -> C# compiler Recently upgraded toolset to 2.0 RTM (.NET Framework, not Core ) Repro – great Getting heap dumps We get to see callstack … but before that, some quotes
  4. … in the metadata writer code
  5. Simplified callstack for readability AV in MetaData emitting – defining a parameter Basically stack corruption (dangerous) Proper RW lock Who corrupts memory? …
  6. GC? … not Roslyn – this is native, no GC Why something else? C# compiler is deterministic Go into assembly (x86) – what is arguments, vs. locals * Great exercise to learn/refresh all this in here
  7. Costly and hard … and requires quite some expertise Variants: Different machine setup? … driver bugs Extreme from Maoni: Real HW?
  8. 1 year old story – 2018 May First background on MetaData Compressed indexes = just schema which says 2B, 4B … variable between files, but static/stable and given per file MethodList = Start of list of methods, INCLUSIVE
  9. How do you fix that? … You don’t … spec bug / format bug Changing rules means rewriting & recompiling all tools (CCI and command line tools like ildasm, or UI Reflector, ILSpy, Visual Studio, debuggers, profilers, …) Compensate? Rearranging fields/methods/params in a way the last one does not need the +1. Nasty Emitting fake type/method with field/method/param to push row count to 2^16. Also nasty Using 0 as valid value? Readers will be surprised, maybe other bugs?
  10. 2009/4 story VirtualStubDispatch uses indexes of interfaces to call virtual methods – code in D<T>, casting to K<string> will use index 1 for interface and method index. But if instance of E is passed, method on I (interface index 1) SECURITY issue! Missing feature – implementing in security patch … risk of breaking changes Proper description complicates spec a lot Kind of problem when spec had 1 reference implementation – described the implementation Bug since introduction of generics, many smart people looked at it, yet missed it
  11. Read slides
  12. OEM getting builds 2 days Paranoia
  13. Sensitive dates like tax date, shopping season? (December) … online stores usually have stop on any changes
  14. February case Blue screen – not something CoreCLR should be able to cause Used PrefetchVirtualMemory to optimize perf – rarely used When you’re on cutting edge, pushing the limits, you should be prepared for anything …
  15. Last July (2018) Story starts 8 months earlier in December 2017 Is it server or client problem? … wireshark traces Around Feb, we know it is client - .NET or Windows March – repro is gone (they upgraded cluster) (fast forward 2 months) May another email thread – similar symptoms Back and forth Heated Realize it is 2 different products on the thread And then couple of more start coming in span of 2 weeks Impact on one customer is huge Potential: Data loss Information disclosure – mixing data in multi-tenant scenarios 3-4 weeks of all-hands on deck + 24/7 We had iDNA trace (TTD / TTT)
  16. What happens when requests are cancelled? If 1st – close connection If last – remove it & and mark for closing If in middle – remove it & and mark for closing
  17. Bad things can happen – imagine you asked: “Does the data exist?” … data loss Multi-tenant scenarios: “Give me data about customer X” … data about Y
  18. Added logging (ETW) – reused buffers Old code – track down bad buffer management
  19. Story begins at University (2000-2003) - Large project (2.5 years, 1M+ lines of code, 5 people)
  20. Something like SharePoint Online, Bing, O365 OneNote – where response to customer matters Caring about perf early (with measurements!) != premature optimization
  21. Help me do better job next time