SlideShare a Scribd company logo
War stories
from .NET team
NDC Oslo 2019
Karel Zikmund – @ziki_cz
Agenda
• Stories
• Investigations on .NET team
• Not just from me
• Lessons learned on the way
You won’t see any:
• Source code
• Debugger
Not needed: Deep .NET
knowledge
Not on agenda
My First Serious Investigation
• Build lab for Windows component
• Build break 1x per week
• AccessViolation dialog hangs machine
• Toolset updated to 2.0 RTM
• Repro:
• Once in ~50 runs
• Overnight run: 247 crashes out of 77,006 runs (0.3%)
My First Serious Investigation
• "The actual crash is occurring on some boilerplate stack checking
code …“
• “Karel is relatively new to the code base so he indicated it might take
some time to understand what’s going on”
mscorwks!UTSemReadWrite::UnlockRead+0xe [f:rtmndpclrsrcutilcodeutsem.cpp @ 357]
mscorwks!CMDSemReadWrite::~CMDSemReadWrite+0x14 [f:rtm...mdencrwutil.cpp @ 1299]
mscorwks!RegMeta::DefineParam+0x196 [f:rtmndpclrsrcmdcompileremit.cpp @ 2719]
cscomp!EMITTER::EmitParamProp
cscomp!ParamAttrBind::Init
cscomp!ParamAttrBind::CompileParamList
cscomp!CLSDREC::compileMethod
cscomp!CLSDREC::CompileMember
cscomp!CLSDREC::EnumMembersInEmitOrder
cscomp!CLSDREC::compileAggregate
cscomp!CLSDREC::compileNamespace
cscomp!COMPILER::CompileAll
cscomp!COMPILER::Compile
cscomp!CController::RunCompiler
cscomp!CController::Compile
csc!main
My First Serious Investigation
My First Serious Investigation
• Who corrupts stack?
• GC?
• NO!
• Changed value between caller and callee
• Single bit changed
• Who corrupts it?
• GC card table updates?
• Of course NOT!
• What about HW?
• Naw!
• Or maybe?
My First Serious Investigation
• Does it by a chance reproduce on only one machine?
• Answer: How did you know?
• But why always the same callstack?
• Good question, no good answer … magic
• Lesson learned: Debugging HW errors is costly and hard
• Always ask: Does it repro on more than 1 machine?
Another MetaData story
MetaData format background:
• Basically database – rows and columns
• Example – TypeDef table:
• Indexes into tables/heaps are either 2B or 4B
• What happens if last TypeDef has no methods?
• MethodList = Number of methods + 1 = max + 1
• What happens if there is 0xffff methods?
Flags TypeName TypeNamespace Extends MethodList
(Public) “Foo” “Awesome.Story” … Method #10
(Private) “Bar” “Awesome.Story” … Method #11
Another MetaData story
• II.24.2.6 “#~ stream”
• If e is a simple index into a table with index i, it is stored using 2 bytes if table i has less than
2^16 rows, otherwise it is stored using 4 bytes.
• II.22.37 TypeDef : 0x02
• 21. If MethodList is non-null, it shall index a valid row in the MethodDef table, where valid
means 1 <= row <= rowcount+1 [ERROR]
• How do you fix it?
• “I’m on the fence whether we should (fix it), given it looks like people hit this about once in 17
years”
• https://github.com/dotnet/corefx/issues/29554
• Lesson learned: Not all bugs have to be fixed
Breaking changes – Intro
• Everyone wants fix for their bug
• But nobody wants to be broken
• Observation: 10% of fixes have unintended side-effects
• Extreme case: Perf improvement can break app
• How many customers?
• Lesson learned: Everything has risk of breaking someone
Breaking changes – Last build
• Finance app crashing – “last” build of Windows 8 on arm (Surface RT)
• Latent bug (introduced months ago)
• Bug triggered by:
1. Method in NGen image has to be across 8KB pages
2. GC has to be triggered at least twice when it’s on stack
• Unrelated change caused “unlucky” method order for:
• System.Net.Configuration.DefaultProxySectionInternal..ctor
• Lesson learned: Anything, really ANYTHING, has risk of breaking
Breaking changes – Huge impact
• Patch to .NET Framework broke certain tax SW
• Printing tax forms
• Update pushed few days before tax deadline in US
• Note: Printing was tested on both sides (Microsoft & tax SW
company)
• But only into file, not to printer
• Lessons learned: Be extra cautious around sensitive dates
Networking – Security issue
• January: Researcher running ML models on Cosmos
• Suspicion about buffers – more logging
• March: Repro gone
• May: Similar report
• +2 weeks: It blows up (more teams & impact)
• All hands on-deck
• Small repro (20 min, then 1 min) … yay!
• TTD trace (iDNA / TTT) … bonus & life saver
Networking – Security issue
• Root-cause: HTTP pipelining under stress
• 13 years old bug (.NET 2.0)
Response 1
Request 1
Server
Response 1
Request 1
Server
Request 2
Response 2
Networking – Security issue
Request 1
Server
Request 2Request 3
Response 1Response 2
Networking – Security issue
Request 1
Server
Request 2Request 3
Response 1Response 2
Networking – Security issue
• We have workaround (disable pipelining) – perf impact
• Worked fix …
• Verifying fix …
• Repro fails after 4h 
• Same symptoms
• Repro sensitive to cloud network load (8-17)
• TTD (iDNA / TTT) does not work 
• Suspicion about buffers again
Networking – Security issue
• Bad buffer lifetime management – on sending side!
• 5 years old bug (.NET 4.5.2)
• Trigger found:
• Thanks to Skype team – 24h deployment of experiments
• Change in .NET 4.7.1
• Fix around the problematic area
• Making the opportunity window SMALLER!
• … counter-intuitive
• Code review – similar bug on receiving side (5 years old)
• Same symptoms as HTTP pipelining
Networking – Security issue
• Why so many customers/services hit it at once?
• Maybe Spectre & Meltdown fixes roll out?
• or just … magic
• Lesson learned: Weird coincidences can happen …
Lessons learned
• Always ask: Does it repro on more than 1 machine?
• Debugging HW bugs is costly
• Some bugs happen once in 17 years
• Spec bugs are hard to fix
• MetaData format bug
• Anything, really ANYTHING, has risk of breaking someone
• Innocent changes can trigger latent bugs elsewhere
• Impact may be huge – e.g. during tax season
• Always try to create small repro
• Make your and everyone’s life easier
• TTD (iDNA / TTT) is life saver
• … sometimes there is just … magic
@ziki_cz
Thank you
• Feedback welcome
• Twitter DM, email, in-person, etc.
• What you liked vs. not?
• Too rushed?
• Hard to understand?
• Boring?
• Didn’t meet your expectations?
@ziki_cz

More Related Content

What's hot

CNIT 126 2: Malware Analysis in Virtual Machines & 3: Basic Dynamic Analysis
CNIT 126 2: Malware Analysis in Virtual Machines & 3: Basic Dynamic AnalysisCNIT 126 2: Malware Analysis in Virtual Machines & 3: Basic Dynamic Analysis
CNIT 126 2: Malware Analysis in Virtual Machines & 3: Basic Dynamic Analysis
Sam Bowne
 
50 Shades of Fuzzing by Peter Hlavaty & Marco Grassi
50 Shades of Fuzzing by Peter Hlavaty & Marco Grassi50 Shades of Fuzzing by Peter Hlavaty & Marco Grassi
50 Shades of Fuzzing by Peter Hlavaty & Marco Grassi
Shakacon
 
YearUp: Hacking for Jobs
YearUp: Hacking for JobsYearUp: Hacking for Jobs
YearUp: Hacking for Jobs
Sam Bowne
 
Adversarial Post-Ex: Lessons From The Pros
Adversarial Post-Ex: Lessons From The ProsAdversarial Post-Ex: Lessons From The Pros
Adversarial Post-Ex: Lessons From The Pros
Justin Warner
 
0day hunting a.k.a. The story of a proper CPE test
0day hunting a.k.a. The story of a proper CPE test0day hunting a.k.a. The story of a proper CPE test
0day hunting a.k.a. The story of a proper CPE test
Balazs Bucsay
 
Sans london april sans at night - tearing apart a fileless malware sample
Sans london april   sans at night - tearing apart a fileless malware sampleSans london april   sans at night - tearing apart a fileless malware sample
Sans london april sans at night - tearing apart a fileless malware sample
Michel Coene
 
Reverse Engineering the TomTom Runner pt. 1
Reverse Engineering the TomTom Runner pt. 1 Reverse Engineering the TomTom Runner pt. 1
Reverse Engineering the TomTom Runner pt. 1
Luis Grangeia
 
Practical Malware Analysis: Ch 2 Malware Analysis in Virtual Machines & 3: Ba...
Practical Malware Analysis: Ch 2 Malware Analysis in Virtual Machines & 3: Ba...Practical Malware Analysis: Ch 2 Malware Analysis in Virtual Machines & 3: Ba...
Practical Malware Analysis: Ch 2 Malware Analysis in Virtual Machines & 3: Ba...
Sam Bowne
 
CheckPlease: Payload-Agnostic Targeted Malware
CheckPlease: Payload-Agnostic Targeted MalwareCheckPlease: Payload-Agnostic Targeted Malware
CheckPlease: Payload-Agnostic Targeted Malware
Brandon Arvanaghi
 
Network Forensics and Practical Packet Analysis
Network Forensics and Practical Packet AnalysisNetwork Forensics and Practical Packet Analysis
Network Forensics and Practical Packet Analysis
Priyanka Aash
 
When is something overflowing
When is something overflowingWhen is something overflowing
When is something overflowing
Peter Hlavaty
 
Breadcrumbs to Loaves: BSides Austin '17
Breadcrumbs to Loaves: BSides Austin '17Breadcrumbs to Loaves: BSides Austin '17
Breadcrumbs to Loaves: BSides Austin '17
Brandon Arvanaghi
 
CNIT 127 Ch 16: Fault Injection and 17: The Art of Fuzzing
CNIT 127 Ch 16: Fault Injection and 17: The Art of FuzzingCNIT 127 Ch 16: Fault Injection and 17: The Art of Fuzzing
CNIT 127 Ch 16: Fault Injection and 17: The Art of Fuzzing
Sam Bowne
 
Honeypots, Cybercompetitions, and Bug Bounties
Honeypots, Cybercompetitions, and Bug BountiesHoneypots, Cybercompetitions, and Bug Bounties
Honeypots, Cybercompetitions, and Bug Bounties
Sam Bowne
 
CNIT 152: 6. Scope & 7. Live Data Collection
CNIT 152: 6. Scope & 7. Live Data CollectionCNIT 152: 6. Scope & 7. Live Data Collection
CNIT 152: 6. Scope & 7. Live Data Collection
Sam Bowne
 
OTP, Concurrency and Testing Strategies
OTP, Concurrency and Testing StrategiesOTP, Concurrency and Testing Strategies
OTP, Concurrency and Testing Strategies
Adrián Mugnolo
 
An EyeWitness View into your Network
An EyeWitness View into your NetworkAn EyeWitness View into your Network
An EyeWitness View into your Network
CTruncer
 
CNIT 152 12. Investigating Windows Systems (Part 3)
CNIT 152 12. Investigating Windows Systems (Part 3)CNIT 152 12. Investigating Windows Systems (Part 3)
CNIT 152 12. Investigating Windows Systems (Part 3)
Sam Bowne
 
CNIT 126 11. Malware Behavior
CNIT 126 11. Malware BehaviorCNIT 126 11. Malware Behavior
CNIT 126 11. Malware Behavior
Sam Bowne
 
3. Security Engineering
3. Security Engineering3. Security Engineering
3. Security Engineering
Sam Bowne
 

What's hot (20)

CNIT 126 2: Malware Analysis in Virtual Machines & 3: Basic Dynamic Analysis
CNIT 126 2: Malware Analysis in Virtual Machines & 3: Basic Dynamic AnalysisCNIT 126 2: Malware Analysis in Virtual Machines & 3: Basic Dynamic Analysis
CNIT 126 2: Malware Analysis in Virtual Machines & 3: Basic Dynamic Analysis
 
50 Shades of Fuzzing by Peter Hlavaty & Marco Grassi
50 Shades of Fuzzing by Peter Hlavaty & Marco Grassi50 Shades of Fuzzing by Peter Hlavaty & Marco Grassi
50 Shades of Fuzzing by Peter Hlavaty & Marco Grassi
 
YearUp: Hacking for Jobs
YearUp: Hacking for JobsYearUp: Hacking for Jobs
YearUp: Hacking for Jobs
 
Adversarial Post-Ex: Lessons From The Pros
Adversarial Post-Ex: Lessons From The ProsAdversarial Post-Ex: Lessons From The Pros
Adversarial Post-Ex: Lessons From The Pros
 
0day hunting a.k.a. The story of a proper CPE test
0day hunting a.k.a. The story of a proper CPE test0day hunting a.k.a. The story of a proper CPE test
0day hunting a.k.a. The story of a proper CPE test
 
Sans london april sans at night - tearing apart a fileless malware sample
Sans london april   sans at night - tearing apart a fileless malware sampleSans london april   sans at night - tearing apart a fileless malware sample
Sans london april sans at night - tearing apart a fileless malware sample
 
Reverse Engineering the TomTom Runner pt. 1
Reverse Engineering the TomTom Runner pt. 1 Reverse Engineering the TomTom Runner pt. 1
Reverse Engineering the TomTom Runner pt. 1
 
Practical Malware Analysis: Ch 2 Malware Analysis in Virtual Machines & 3: Ba...
Practical Malware Analysis: Ch 2 Malware Analysis in Virtual Machines & 3: Ba...Practical Malware Analysis: Ch 2 Malware Analysis in Virtual Machines & 3: Ba...
Practical Malware Analysis: Ch 2 Malware Analysis in Virtual Machines & 3: Ba...
 
CheckPlease: Payload-Agnostic Targeted Malware
CheckPlease: Payload-Agnostic Targeted MalwareCheckPlease: Payload-Agnostic Targeted Malware
CheckPlease: Payload-Agnostic Targeted Malware
 
Network Forensics and Practical Packet Analysis
Network Forensics and Practical Packet AnalysisNetwork Forensics and Practical Packet Analysis
Network Forensics and Practical Packet Analysis
 
When is something overflowing
When is something overflowingWhen is something overflowing
When is something overflowing
 
Breadcrumbs to Loaves: BSides Austin '17
Breadcrumbs to Loaves: BSides Austin '17Breadcrumbs to Loaves: BSides Austin '17
Breadcrumbs to Loaves: BSides Austin '17
 
CNIT 127 Ch 16: Fault Injection and 17: The Art of Fuzzing
CNIT 127 Ch 16: Fault Injection and 17: The Art of FuzzingCNIT 127 Ch 16: Fault Injection and 17: The Art of Fuzzing
CNIT 127 Ch 16: Fault Injection and 17: The Art of Fuzzing
 
Honeypots, Cybercompetitions, and Bug Bounties
Honeypots, Cybercompetitions, and Bug BountiesHoneypots, Cybercompetitions, and Bug Bounties
Honeypots, Cybercompetitions, and Bug Bounties
 
CNIT 152: 6. Scope & 7. Live Data Collection
CNIT 152: 6. Scope & 7. Live Data CollectionCNIT 152: 6. Scope & 7. Live Data Collection
CNIT 152: 6. Scope & 7. Live Data Collection
 
OTP, Concurrency and Testing Strategies
OTP, Concurrency and Testing StrategiesOTP, Concurrency and Testing Strategies
OTP, Concurrency and Testing Strategies
 
An EyeWitness View into your Network
An EyeWitness View into your NetworkAn EyeWitness View into your Network
An EyeWitness View into your Network
 
CNIT 152 12. Investigating Windows Systems (Part 3)
CNIT 152 12. Investigating Windows Systems (Part 3)CNIT 152 12. Investigating Windows Systems (Part 3)
CNIT 152 12. Investigating Windows Systems (Part 3)
 
CNIT 126 11. Malware Behavior
CNIT 126 11. Malware BehaviorCNIT 126 11. Malware Behavior
CNIT 126 11. Malware Behavior
 
3. Security Engineering
3. Security Engineering3. Security Engineering
3. Security Engineering
 

Similar to NDC Oslo 2019 - War stories from .NET team -- Karel Zikmund

Patching Windows Executables with the Backdoor Factory | DerbyCon 2013
Patching Windows Executables with the Backdoor Factory | DerbyCon 2013Patching Windows Executables with the Backdoor Factory | DerbyCon 2013
Patching Windows Executables with the Backdoor Factory | DerbyCon 2013
midnite_runr
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
smallerror
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
xlight
 
Fixing twitter
Fixing twitterFixing twitter
Fixing twitter
Roger Xia
 
Fixing_Twitter
Fixing_TwitterFixing_Twitter
Fixing_Twitter
liujianrong
 
Case Study of the Unexplained
Case Study of the UnexplainedCase Study of the Unexplained
Case Study of the Unexplained
shannomc
 
Surge2012
Surge2012Surge2012
Surge2012
davidapacheco
 
Debugging multiplayer games
Debugging multiplayer gamesDebugging multiplayer games
Debugging multiplayer games
Maciej Siniło
 
Esage on non-existent 0-days, stable binary exploits and user interaction
Esage   on non-existent 0-days, stable binary exploits and user interactionEsage   on non-existent 0-days, stable binary exploits and user interaction
Esage on non-existent 0-days, stable binary exploits and user interaction
DefconRussia
 
It summit 150604 cb_wcl_ld_kmh_v6_to_publish
It summit 150604 cb_wcl_ld_kmh_v6_to_publishIt summit 150604 cb_wcl_ld_kmh_v6_to_publish
It summit 150604 cb_wcl_ld_kmh_v6_to_publish
kevin_donovan
 
On non existent 0-days, stable binary exploits and
On non existent 0-days, stable binary exploits andOn non existent 0-days, stable binary exploits and
On non existent 0-days, stable binary exploits and
Alisa Esage Шевченко
 
Using ~300 Billion DNS Queries to Analyse the TLD Name Collision Problem
Using ~300 Billion DNS Queries to Analyse the TLD Name Collision ProblemUsing ~300 Billion DNS Queries to Analyse the TLD Name Collision Problem
Using ~300 Billion DNS Queries to Analyse the TLD Name Collision Problem
APNIC
 
Vulnerability Inheritance in ICS (English)
Vulnerability Inheritance in ICS (English)Vulnerability Inheritance in ICS (English)
Vulnerability Inheritance in ICS (English)
Digital Bond
 
Chirp 2010: Scaling Twitter
Chirp 2010: Scaling TwitterChirp 2010: Scaling Twitter
Chirp 2010: Scaling Twitter
John Adams
 
Security research over Windows #defcon china
Security research over Windows #defcon chinaSecurity research over Windows #defcon china
Security research over Windows #defcon china
Peter Hlavaty
 
Introduction to multicore .ppt
Introduction to multicore .pptIntroduction to multicore .ppt
Introduction to multicore .ppt
Rajagopal Nagarajan
 
There's no magic... until you talk about databases
 There's no magic... until you talk about databases There's no magic... until you talk about databases
There's no magic... until you talk about databases
ESUG
 
NDC London 2020 - Challenges of Managing CoreFx Repo -- Karel Zikmund
NDC London 2020 - Challenges of Managing CoreFx Repo -- Karel ZikmundNDC London 2020 - Challenges of Managing CoreFx Repo -- Karel Zikmund
NDC London 2020 - Challenges of Managing CoreFx Repo -- Karel Zikmund
Karel Zikmund
 
John adams talk cloudy
John adams   talk cloudyJohn adams   talk cloudy
John adams talk cloudy
John Adams
 
Silicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Silicon Valley Code Camp 2015 - Advanced MongoDB - The SequelSilicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Silicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Daniel Coupal
 

Similar to NDC Oslo 2019 - War stories from .NET team -- Karel Zikmund (20)

Patching Windows Executables with the Backdoor Factory | DerbyCon 2013
Patching Windows Executables with the Backdoor Factory | DerbyCon 2013Patching Windows Executables with the Backdoor Factory | DerbyCon 2013
Patching Windows Executables with the Backdoor Factory | DerbyCon 2013
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
 
Fixing twitter
Fixing twitterFixing twitter
Fixing twitter
 
Fixing_Twitter
Fixing_TwitterFixing_Twitter
Fixing_Twitter
 
Case Study of the Unexplained
Case Study of the UnexplainedCase Study of the Unexplained
Case Study of the Unexplained
 
Surge2012
Surge2012Surge2012
Surge2012
 
Debugging multiplayer games
Debugging multiplayer gamesDebugging multiplayer games
Debugging multiplayer games
 
Esage on non-existent 0-days, stable binary exploits and user interaction
Esage   on non-existent 0-days, stable binary exploits and user interactionEsage   on non-existent 0-days, stable binary exploits and user interaction
Esage on non-existent 0-days, stable binary exploits and user interaction
 
It summit 150604 cb_wcl_ld_kmh_v6_to_publish
It summit 150604 cb_wcl_ld_kmh_v6_to_publishIt summit 150604 cb_wcl_ld_kmh_v6_to_publish
It summit 150604 cb_wcl_ld_kmh_v6_to_publish
 
On non existent 0-days, stable binary exploits and
On non existent 0-days, stable binary exploits andOn non existent 0-days, stable binary exploits and
On non existent 0-days, stable binary exploits and
 
Using ~300 Billion DNS Queries to Analyse the TLD Name Collision Problem
Using ~300 Billion DNS Queries to Analyse the TLD Name Collision ProblemUsing ~300 Billion DNS Queries to Analyse the TLD Name Collision Problem
Using ~300 Billion DNS Queries to Analyse the TLD Name Collision Problem
 
Vulnerability Inheritance in ICS (English)
Vulnerability Inheritance in ICS (English)Vulnerability Inheritance in ICS (English)
Vulnerability Inheritance in ICS (English)
 
Chirp 2010: Scaling Twitter
Chirp 2010: Scaling TwitterChirp 2010: Scaling Twitter
Chirp 2010: Scaling Twitter
 
Security research over Windows #defcon china
Security research over Windows #defcon chinaSecurity research over Windows #defcon china
Security research over Windows #defcon china
 
Introduction to multicore .ppt
Introduction to multicore .pptIntroduction to multicore .ppt
Introduction to multicore .ppt
 
There's no magic... until you talk about databases
 There's no magic... until you talk about databases There's no magic... until you talk about databases
There's no magic... until you talk about databases
 
NDC London 2020 - Challenges of Managing CoreFx Repo -- Karel Zikmund
NDC London 2020 - Challenges of Managing CoreFx Repo -- Karel ZikmundNDC London 2020 - Challenges of Managing CoreFx Repo -- Karel Zikmund
NDC London 2020 - Challenges of Managing CoreFx Repo -- Karel Zikmund
 
John adams talk cloudy
John adams   talk cloudyJohn adams   talk cloudy
John adams talk cloudy
 
Silicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Silicon Valley Code Camp 2015 - Advanced MongoDB - The SequelSilicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Silicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
 

More from Karel Zikmund

.NET Conf 2022 - Networking in .NET 7
.NET Conf 2022 - Networking in .NET 7.NET Conf 2022 - Networking in .NET 7
.NET Conf 2022 - Networking in .NET 7
Karel Zikmund
 
NDC Sydney 2019 - Async Demystified -- Karel Zikmund
NDC Sydney 2019 - Async Demystified -- Karel ZikmundNDC Sydney 2019 - Async Demystified -- Karel Zikmund
NDC Sydney 2019 - Async Demystified -- Karel Zikmund
Karel Zikmund
 
WUG Days 2022 Brno - Networking in .NET 7.0 and YARP -- Karel Zikmund
WUG Days 2022 Brno - Networking in .NET 7.0 and YARP -- Karel ZikmundWUG Days 2022 Brno - Networking in .NET 7.0 and YARP -- Karel Zikmund
WUG Days 2022 Brno - Networking in .NET 7.0 and YARP -- Karel Zikmund
Karel Zikmund
 
.NET Core Summer event 2019 in Vienna, AT - .NET 5 - Future of .NET on Mobile...
.NET Core Summer event 2019 in Vienna, AT - .NET 5 - Future of .NET on Mobile....NET Core Summer event 2019 in Vienna, AT - .NET 5 - Future of .NET on Mobile...
.NET Core Summer event 2019 in Vienna, AT - .NET 5 - Future of .NET on Mobile...
Karel Zikmund
 
.NET Core Summer event 2019 in Brno, CZ - Async demystified -- Karel Zikmund
.NET Core Summer event 2019 in Brno, CZ - Async demystified -- Karel Zikmund.NET Core Summer event 2019 in Brno, CZ - Async demystified -- Karel Zikmund
.NET Core Summer event 2019 in Brno, CZ - Async demystified -- Karel Zikmund
Karel Zikmund
 
.NET Core Summer event 2019 in Brno, CZ - .NET Core Networking stack and perf...
.NET Core Summer event 2019 in Brno, CZ - .NET Core Networking stack and perf....NET Core Summer event 2019 in Brno, CZ - .NET Core Networking stack and perf...
.NET Core Summer event 2019 in Brno, CZ - .NET Core Networking stack and perf...
Karel Zikmund
 
DotNext 2017 in Moscow - Challenges of Managing CoreFX repo -- Karel Zikmund
DotNext 2017 in Moscow - Challenges of Managing CoreFX repo -- Karel ZikmundDotNext 2017 in Moscow - Challenges of Managing CoreFX repo -- Karel Zikmund
DotNext 2017 in Moscow - Challenges of Managing CoreFX repo -- Karel Zikmund
Karel Zikmund
 
DotNext 2017 in Moscow - .NET Core Networking stack and Performance -- Karel ...
DotNext 2017 in Moscow - .NET Core Networking stack and Performance -- Karel ...DotNext 2017 in Moscow - .NET Core Networking stack and Performance -- Karel ...
DotNext 2017 in Moscow - .NET Core Networking stack and Performance -- Karel ...
Karel Zikmund
 
.NET MeetUp Brno 2017 - Microsoft Engineering teams in Europe -- Karel Zikmund
.NET MeetUp Brno 2017 - Microsoft Engineering teams in Europe -- Karel Zikmund.NET MeetUp Brno 2017 - Microsoft Engineering teams in Europe -- Karel Zikmund
.NET MeetUp Brno 2017 - Microsoft Engineering teams in Europe -- Karel Zikmund
Karel Zikmund
 
.NET MeetUp Brno 2017 - Xamarin .NET internals -- Marek Safar
.NET MeetUp Brno 2017 - Xamarin .NET internals -- Marek Safar.NET MeetUp Brno 2017 - Xamarin .NET internals -- Marek Safar
.NET MeetUp Brno 2017 - Xamarin .NET internals -- Marek Safar
Karel Zikmund
 
.NET MeetUp Brno - Challenges of Managing CoreFX repo -- Karel Zikmund
.NET MeetUp Brno - Challenges of Managing CoreFX repo -- Karel Zikmund.NET MeetUp Brno - Challenges of Managing CoreFX repo -- Karel Zikmund
.NET MeetUp Brno - Challenges of Managing CoreFX repo -- Karel Zikmund
Karel Zikmund
 
.NET Fringe 2017 - Challenges of Managing CoreFX repo -- Karel Zikmund
.NET Fringe 2017 - Challenges of Managing CoreFX repo -- Karel Zikmund.NET Fringe 2017 - Challenges of Managing CoreFX repo -- Karel Zikmund
.NET Fringe 2017 - Challenges of Managing CoreFX repo -- Karel Zikmund
Karel Zikmund
 
.NET MeetUp Prague 2017 - Challenges of Managing CoreFX repo -- Karel Zikmund
.NET MeetUp Prague 2017 - Challenges of Managing CoreFX repo -- Karel Zikmund.NET MeetUp Prague 2017 - Challenges of Managing CoreFX repo -- Karel Zikmund
.NET MeetUp Prague 2017 - Challenges of Managing CoreFX repo -- Karel Zikmund
Karel Zikmund
 
.NET MeetUp Prague 2017 - .NET Standard -- Karel Zikmund
.NET MeetUp Prague 2017 - .NET Standard -- Karel Zikmund.NET MeetUp Prague 2017 - .NET Standard -- Karel Zikmund
.NET MeetUp Prague 2017 - .NET Standard -- Karel Zikmund
Karel Zikmund
 
.NET MeetUp Amsterdam 2017 - Challenges of Managing CoreFX repo -- Karel Zikmund
.NET MeetUp Amsterdam 2017 - Challenges of Managing CoreFX repo -- Karel Zikmund.NET MeetUp Amsterdam 2017 - Challenges of Managing CoreFX repo -- Karel Zikmund
.NET MeetUp Amsterdam 2017 - Challenges of Managing CoreFX repo -- Karel Zikmund
Karel Zikmund
 
.NET MeetUp Amsterdam 2017 - .NET Standard -- Karel Zikmund
.NET MeetUp Amsterdam 2017 - .NET Standard -- Karel Zikmund.NET MeetUp Amsterdam 2017 - .NET Standard -- Karel Zikmund
.NET MeetUp Amsterdam 2017 - .NET Standard -- Karel Zikmund
Karel Zikmund
 

More from Karel Zikmund (16)

.NET Conf 2022 - Networking in .NET 7
.NET Conf 2022 - Networking in .NET 7.NET Conf 2022 - Networking in .NET 7
.NET Conf 2022 - Networking in .NET 7
 
NDC Sydney 2019 - Async Demystified -- Karel Zikmund
NDC Sydney 2019 - Async Demystified -- Karel ZikmundNDC Sydney 2019 - Async Demystified -- Karel Zikmund
NDC Sydney 2019 - Async Demystified -- Karel Zikmund
 
WUG Days 2022 Brno - Networking in .NET 7.0 and YARP -- Karel Zikmund
WUG Days 2022 Brno - Networking in .NET 7.0 and YARP -- Karel ZikmundWUG Days 2022 Brno - Networking in .NET 7.0 and YARP -- Karel Zikmund
WUG Days 2022 Brno - Networking in .NET 7.0 and YARP -- Karel Zikmund
 
.NET Core Summer event 2019 in Vienna, AT - .NET 5 - Future of .NET on Mobile...
.NET Core Summer event 2019 in Vienna, AT - .NET 5 - Future of .NET on Mobile....NET Core Summer event 2019 in Vienna, AT - .NET 5 - Future of .NET on Mobile...
.NET Core Summer event 2019 in Vienna, AT - .NET 5 - Future of .NET on Mobile...
 
.NET Core Summer event 2019 in Brno, CZ - Async demystified -- Karel Zikmund
.NET Core Summer event 2019 in Brno, CZ - Async demystified -- Karel Zikmund.NET Core Summer event 2019 in Brno, CZ - Async demystified -- Karel Zikmund
.NET Core Summer event 2019 in Brno, CZ - Async demystified -- Karel Zikmund
 
.NET Core Summer event 2019 in Brno, CZ - .NET Core Networking stack and perf...
.NET Core Summer event 2019 in Brno, CZ - .NET Core Networking stack and perf....NET Core Summer event 2019 in Brno, CZ - .NET Core Networking stack and perf...
.NET Core Summer event 2019 in Brno, CZ - .NET Core Networking stack and perf...
 
DotNext 2017 in Moscow - Challenges of Managing CoreFX repo -- Karel Zikmund
DotNext 2017 in Moscow - Challenges of Managing CoreFX repo -- Karel ZikmundDotNext 2017 in Moscow - Challenges of Managing CoreFX repo -- Karel Zikmund
DotNext 2017 in Moscow - Challenges of Managing CoreFX repo -- Karel Zikmund
 
DotNext 2017 in Moscow - .NET Core Networking stack and Performance -- Karel ...
DotNext 2017 in Moscow - .NET Core Networking stack and Performance -- Karel ...DotNext 2017 in Moscow - .NET Core Networking stack and Performance -- Karel ...
DotNext 2017 in Moscow - .NET Core Networking stack and Performance -- Karel ...
 
.NET MeetUp Brno 2017 - Microsoft Engineering teams in Europe -- Karel Zikmund
.NET MeetUp Brno 2017 - Microsoft Engineering teams in Europe -- Karel Zikmund.NET MeetUp Brno 2017 - Microsoft Engineering teams in Europe -- Karel Zikmund
.NET MeetUp Brno 2017 - Microsoft Engineering teams in Europe -- Karel Zikmund
 
.NET MeetUp Brno 2017 - Xamarin .NET internals -- Marek Safar
.NET MeetUp Brno 2017 - Xamarin .NET internals -- Marek Safar.NET MeetUp Brno 2017 - Xamarin .NET internals -- Marek Safar
.NET MeetUp Brno 2017 - Xamarin .NET internals -- Marek Safar
 
.NET MeetUp Brno - Challenges of Managing CoreFX repo -- Karel Zikmund
.NET MeetUp Brno - Challenges of Managing CoreFX repo -- Karel Zikmund.NET MeetUp Brno - Challenges of Managing CoreFX repo -- Karel Zikmund
.NET MeetUp Brno - Challenges of Managing CoreFX repo -- Karel Zikmund
 
.NET Fringe 2017 - Challenges of Managing CoreFX repo -- Karel Zikmund
.NET Fringe 2017 - Challenges of Managing CoreFX repo -- Karel Zikmund.NET Fringe 2017 - Challenges of Managing CoreFX repo -- Karel Zikmund
.NET Fringe 2017 - Challenges of Managing CoreFX repo -- Karel Zikmund
 
.NET MeetUp Prague 2017 - Challenges of Managing CoreFX repo -- Karel Zikmund
.NET MeetUp Prague 2017 - Challenges of Managing CoreFX repo -- Karel Zikmund.NET MeetUp Prague 2017 - Challenges of Managing CoreFX repo -- Karel Zikmund
.NET MeetUp Prague 2017 - Challenges of Managing CoreFX repo -- Karel Zikmund
 
.NET MeetUp Prague 2017 - .NET Standard -- Karel Zikmund
.NET MeetUp Prague 2017 - .NET Standard -- Karel Zikmund.NET MeetUp Prague 2017 - .NET Standard -- Karel Zikmund
.NET MeetUp Prague 2017 - .NET Standard -- Karel Zikmund
 
.NET MeetUp Amsterdam 2017 - Challenges of Managing CoreFX repo -- Karel Zikmund
.NET MeetUp Amsterdam 2017 - Challenges of Managing CoreFX repo -- Karel Zikmund.NET MeetUp Amsterdam 2017 - Challenges of Managing CoreFX repo -- Karel Zikmund
.NET MeetUp Amsterdam 2017 - Challenges of Managing CoreFX repo -- Karel Zikmund
 
.NET MeetUp Amsterdam 2017 - .NET Standard -- Karel Zikmund
.NET MeetUp Amsterdam 2017 - .NET Standard -- Karel Zikmund.NET MeetUp Amsterdam 2017 - .NET Standard -- Karel Zikmund
.NET MeetUp Amsterdam 2017 - .NET Standard -- Karel Zikmund
 

Recently uploaded

E-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet DynamicsE-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet Dynamics
Hornet Dynamics
 
SQL Accounting Software Brochure Malaysia
SQL Accounting Software Brochure MalaysiaSQL Accounting Software Brochure Malaysia
SQL Accounting Software Brochure Malaysia
GohKiangHock
 
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, FactsALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
Green Software Development
 
zOS Mainframe JES2-JES3 JCL-JECL Differences
zOS Mainframe JES2-JES3 JCL-JECL DifferenceszOS Mainframe JES2-JES3 JCL-JECL Differences
zOS Mainframe JES2-JES3 JCL-JECL Differences
YousufSait3
 
一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理
一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理
一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理
dakas1
 
Lecture 2 - software testing SE 412.pptx
Lecture 2 - software testing SE 412.pptxLecture 2 - software testing SE 412.pptx
Lecture 2 - software testing SE 412.pptx
TaghreedAltamimi
 
Unveiling the Advantages of Agile Software Development.pdf
Unveiling the Advantages of Agile Software Development.pdfUnveiling the Advantages of Agile Software Development.pdf
Unveiling the Advantages of Agile Software Development.pdf
brainerhub1
 
YAML crash COURSE how to write yaml file for adding configuring details
YAML crash COURSE how to write yaml file for adding configuring detailsYAML crash COURSE how to write yaml file for adding configuring details
YAML crash COURSE how to write yaml file for adding configuring details
NishanthaBulumulla1
 
The Key to Digital Success_ A Comprehensive Guide to Continuous Testing Integ...
The Key to Digital Success_ A Comprehensive Guide to Continuous Testing Integ...The Key to Digital Success_ A Comprehensive Guide to Continuous Testing Integ...
The Key to Digital Success_ A Comprehensive Guide to Continuous Testing Integ...
kalichargn70th171
 
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
XfilesPro
 
Microservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we workMicroservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we work
Sven Peters
 
8 Best Automated Android App Testing Tool and Framework in 2024.pdf
8 Best Automated Android App Testing Tool and Framework in 2024.pdf8 Best Automated Android App Testing Tool and Framework in 2024.pdf
8 Best Automated Android App Testing Tool and Framework in 2024.pdf
kalichargn70th171
 
14 th Edition of International conference on computer vision
14 th Edition of International conference on computer vision14 th Edition of International conference on computer vision
14 th Edition of International conference on computer vision
ShulagnaSarkar2
 
What next after learning python programming basics
What next after learning python programming basicsWhat next after learning python programming basics
What next after learning python programming basics
Rakesh Kumar R
 
Oracle 23c New Features For DBAs and Developers.pptx
Oracle 23c New Features For DBAs and Developers.pptxOracle 23c New Features For DBAs and Developers.pptx
Oracle 23c New Features For DBAs and Developers.pptx
Remote DBA Services
 
Artificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension FunctionsArtificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension Functions
Octavian Nadolu
 
Oracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptxOracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptx
Remote DBA Services
 
WWDC 2024 Keynote Review: For CocoaCoders Austin
WWDC 2024 Keynote Review: For CocoaCoders AustinWWDC 2024 Keynote Review: For CocoaCoders Austin
WWDC 2024 Keynote Review: For CocoaCoders Austin
Patrick Weigel
 
All you need to know about Spring Boot and GraalVM
All you need to know about Spring Boot and GraalVMAll you need to know about Spring Boot and GraalVM
All you need to know about Spring Boot and GraalVM
Alina Yurenko
 
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CDKuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
rodomar2
 

Recently uploaded (20)

E-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet DynamicsE-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet Dynamics
 
SQL Accounting Software Brochure Malaysia
SQL Accounting Software Brochure MalaysiaSQL Accounting Software Brochure Malaysia
SQL Accounting Software Brochure Malaysia
 
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, FactsALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
 
zOS Mainframe JES2-JES3 JCL-JECL Differences
zOS Mainframe JES2-JES3 JCL-JECL DifferenceszOS Mainframe JES2-JES3 JCL-JECL Differences
zOS Mainframe JES2-JES3 JCL-JECL Differences
 
一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理
一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理
一比一原版(UMN毕业证)明尼苏达大学毕业证如何办理
 
Lecture 2 - software testing SE 412.pptx
Lecture 2 - software testing SE 412.pptxLecture 2 - software testing SE 412.pptx
Lecture 2 - software testing SE 412.pptx
 
Unveiling the Advantages of Agile Software Development.pdf
Unveiling the Advantages of Agile Software Development.pdfUnveiling the Advantages of Agile Software Development.pdf
Unveiling the Advantages of Agile Software Development.pdf
 
YAML crash COURSE how to write yaml file for adding configuring details
YAML crash COURSE how to write yaml file for adding configuring detailsYAML crash COURSE how to write yaml file for adding configuring details
YAML crash COURSE how to write yaml file for adding configuring details
 
The Key to Digital Success_ A Comprehensive Guide to Continuous Testing Integ...
The Key to Digital Success_ A Comprehensive Guide to Continuous Testing Integ...The Key to Digital Success_ A Comprehensive Guide to Continuous Testing Integ...
The Key to Digital Success_ A Comprehensive Guide to Continuous Testing Integ...
 
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
Everything You Need to Know About X-Sign: The eSign Functionality of XfilesPr...
 
Microservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we workMicroservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we work
 
8 Best Automated Android App Testing Tool and Framework in 2024.pdf
8 Best Automated Android App Testing Tool and Framework in 2024.pdf8 Best Automated Android App Testing Tool and Framework in 2024.pdf
8 Best Automated Android App Testing Tool and Framework in 2024.pdf
 
14 th Edition of International conference on computer vision
14 th Edition of International conference on computer vision14 th Edition of International conference on computer vision
14 th Edition of International conference on computer vision
 
What next after learning python programming basics
What next after learning python programming basicsWhat next after learning python programming basics
What next after learning python programming basics
 
Oracle 23c New Features For DBAs and Developers.pptx
Oracle 23c New Features For DBAs and Developers.pptxOracle 23c New Features For DBAs and Developers.pptx
Oracle 23c New Features For DBAs and Developers.pptx
 
Artificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension FunctionsArtificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension Functions
 
Oracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptxOracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptx
 
WWDC 2024 Keynote Review: For CocoaCoders Austin
WWDC 2024 Keynote Review: For CocoaCoders AustinWWDC 2024 Keynote Review: For CocoaCoders Austin
WWDC 2024 Keynote Review: For CocoaCoders Austin
 
All you need to know about Spring Boot and GraalVM
All you need to know about Spring Boot and GraalVMAll you need to know about Spring Boot and GraalVM
All you need to know about Spring Boot and GraalVM
 
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CDKuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
 

NDC Oslo 2019 - War stories from .NET team -- Karel Zikmund

  • 1. War stories from .NET team NDC Oslo 2019 Karel Zikmund – @ziki_cz
  • 2.
  • 3. Agenda • Stories • Investigations on .NET team • Not just from me • Lessons learned on the way You won’t see any: • Source code • Debugger Not needed: Deep .NET knowledge Not on agenda
  • 4.
  • 5. My First Serious Investigation • Build lab for Windows component • Build break 1x per week • AccessViolation dialog hangs machine • Toolset updated to 2.0 RTM • Repro: • Once in ~50 runs • Overnight run: 247 crashes out of 77,006 runs (0.3%)
  • 6. My First Serious Investigation • "The actual crash is occurring on some boilerplate stack checking code …“ • “Karel is relatively new to the code base so he indicated it might take some time to understand what’s going on”
  • 7. mscorwks!UTSemReadWrite::UnlockRead+0xe [f:rtmndpclrsrcutilcodeutsem.cpp @ 357] mscorwks!CMDSemReadWrite::~CMDSemReadWrite+0x14 [f:rtm...mdencrwutil.cpp @ 1299] mscorwks!RegMeta::DefineParam+0x196 [f:rtmndpclrsrcmdcompileremit.cpp @ 2719] cscomp!EMITTER::EmitParamProp cscomp!ParamAttrBind::Init cscomp!ParamAttrBind::CompileParamList cscomp!CLSDREC::compileMethod cscomp!CLSDREC::CompileMember cscomp!CLSDREC::EnumMembersInEmitOrder cscomp!CLSDREC::compileAggregate cscomp!CLSDREC::compileNamespace cscomp!COMPILER::CompileAll cscomp!COMPILER::Compile cscomp!CController::RunCompiler cscomp!CController::Compile csc!main My First Serious Investigation
  • 8. My First Serious Investigation • Who corrupts stack? • GC? • NO! • Changed value between caller and callee • Single bit changed • Who corrupts it? • GC card table updates? • Of course NOT! • What about HW? • Naw! • Or maybe?
  • 9. My First Serious Investigation • Does it by a chance reproduce on only one machine? • Answer: How did you know? • But why always the same callstack? • Good question, no good answer … magic • Lesson learned: Debugging HW errors is costly and hard • Always ask: Does it repro on more than 1 machine?
  • 10. Another MetaData story MetaData format background: • Basically database – rows and columns • Example – TypeDef table: • Indexes into tables/heaps are either 2B or 4B • What happens if last TypeDef has no methods? • MethodList = Number of methods + 1 = max + 1 • What happens if there is 0xffff methods? Flags TypeName TypeNamespace Extends MethodList (Public) “Foo” “Awesome.Story” … Method #10 (Private) “Bar” “Awesome.Story” … Method #11
  • 11. Another MetaData story • II.24.2.6 “#~ stream” • If e is a simple index into a table with index i, it is stored using 2 bytes if table i has less than 2^16 rows, otherwise it is stored using 4 bytes. • II.22.37 TypeDef : 0x02 • 21. If MethodList is non-null, it shall index a valid row in the MethodDef table, where valid means 1 <= row <= rowcount+1 [ERROR] • How do you fix it? • “I’m on the fence whether we should (fix it), given it looks like people hit this about once in 17 years” • https://github.com/dotnet/corefx/issues/29554 • Lesson learned: Not all bugs have to be fixed
  • 12. Breaking changes – Intro • Everyone wants fix for their bug • But nobody wants to be broken • Observation: 10% of fixes have unintended side-effects • Extreme case: Perf improvement can break app • How many customers? • Lesson learned: Everything has risk of breaking someone
  • 13. Breaking changes – Last build • Finance app crashing – “last” build of Windows 8 on arm (Surface RT) • Latent bug (introduced months ago) • Bug triggered by: 1. Method in NGen image has to be across 8KB pages 2. GC has to be triggered at least twice when it’s on stack • Unrelated change caused “unlucky” method order for: • System.Net.Configuration.DefaultProxySectionInternal..ctor • Lesson learned: Anything, really ANYTHING, has risk of breaking
  • 14. Breaking changes – Huge impact • Patch to .NET Framework broke certain tax SW • Printing tax forms • Update pushed few days before tax deadline in US • Note: Printing was tested on both sides (Microsoft & tax SW company) • But only into file, not to printer • Lessons learned: Be extra cautious around sensitive dates
  • 15. Networking – Security issue • January: Researcher running ML models on Cosmos • Suspicion about buffers – more logging • March: Repro gone • May: Similar report • +2 weeks: It blows up (more teams & impact) • All hands on-deck • Small repro (20 min, then 1 min) … yay! • TTD trace (iDNA / TTT) … bonus & life saver
  • 16. Networking – Security issue • Root-cause: HTTP pipelining under stress • 13 years old bug (.NET 2.0) Response 1 Request 1 Server Response 1 Request 1 Server Request 2 Response 2
  • 17. Networking – Security issue Request 1 Server Request 2Request 3 Response 1Response 2
  • 18. Networking – Security issue Request 1 Server Request 2Request 3 Response 1Response 2
  • 19. Networking – Security issue • We have workaround (disable pipelining) – perf impact • Worked fix … • Verifying fix … • Repro fails after 4h  • Same symptoms • Repro sensitive to cloud network load (8-17) • TTD (iDNA / TTT) does not work  • Suspicion about buffers again
  • 20. Networking – Security issue • Bad buffer lifetime management – on sending side! • 5 years old bug (.NET 4.5.2) • Trigger found: • Thanks to Skype team – 24h deployment of experiments • Change in .NET 4.7.1 • Fix around the problematic area • Making the opportunity window SMALLER! • … counter-intuitive • Code review – similar bug on receiving side (5 years old) • Same symptoms as HTTP pipelining
  • 21. Networking – Security issue • Why so many customers/services hit it at once? • Maybe Spectre & Meltdown fixes roll out? • or just … magic • Lesson learned: Weird coincidences can happen …
  • 22. Lessons learned • Always ask: Does it repro on more than 1 machine? • Debugging HW bugs is costly • Some bugs happen once in 17 years • Spec bugs are hard to fix • MetaData format bug • Anything, really ANYTHING, has risk of breaking someone • Innocent changes can trigger latent bugs elsewhere • Impact may be huge – e.g. during tax season • Always try to create small repro • Make your and everyone’s life easier • TTD (iDNA / TTT) is life saver • … sometimes there is just … magic @ziki_cz
  • 23. Thank you • Feedback welcome • Twitter DM, email, in-person, etc. • What you liked vs. not? • Too rushed? • Hard to understand? • Boring? • Didn’t meet your expectations? @ziki_cz

Editor's Notes

  1. Quickly about me: .NET team for almost 14 years Started as junior / out of college on Runtime – C++, pieces like Metadata, TypeSystem, Assembly Loader Later on moved to manager role Then moved to BCL (Base Class Libraries) – Networking area mainly (HttpClient) … working in open-source (.NET Core) Community manager of dotnet/corefx repo
  2. Advanced .NET Debugging techniques from real world investigations – by Kevin Goose & Christophe Nasarre (Criteo folks)
  3. 2006 January – 3 months in MS Large code base, dozens of machines, productivity impact on larger team Crash – “hang dialog” with AV msbuild -> C# compiler Recently upgraded toolset to 2.0 RTM (.NET Framework, not Core ) Repro – great Getting heap dumps We get to see callstack … but before that, some quotes
  4. … in the metadata writer code
  5. Simplified callstack for readability AV in MetaData emitting – defining a parameter Basically stack corruption (dangerous) Proper RW lock Who corrupts memory? …
  6. GC? … not Roslyn – this is native, no GC Why something else? C# compiler is deterministic Go into assembly (x86) – what is arguments, vs. locals * Great exercise to learn/refresh all this in here
  7. Costly and hard … and requires quite some expertise Variants: Different machine setup? … driver bugs Extreme from Maoni: Real HW?
  8. 1 year old story – 2018 May First background on MetaData Compressed indexes = just schema which says 2B, 4B … variable between files, but static/stable and given per file MethodList = Start of list of methods, INCLUSIVE
  9. How do you fix that? … You don’t … spec bug / format bug Changing rules means rewriting & recompiling all tools (CCI and command line tools like ildasm, or UI Reflector, ILSpy, Visual Studio, debuggers, profilers, …) Compensate? Rearranging fields/methods/params in a way the last one does not need the +1. Nasty Emitting fake type/method with field/method/param to push row count to 2^16. Also nasty Using 0 as valid value? Readers will be surprised, maybe other bugs?
  10. Read slides
  11. OEM getting builds 2 days Paranoia
  12. Sensitive dates like tax date, shopping season? (December) … online stores usually have stop on any changes
  13. Last July (2018) Story starts 8 months earlier in December 2017 Is it server or client problem? … wireshark traces Around Feb, we know it is client - .NET or Windows March – repro is gone (they upgraded cluster) (fast forward 2 months) May another email thread – similar symptoms Back and forth Heated Realize it is 2 different products on the thread And then couple of more start coming in span of 2 weeks Impact on one customer is huge Potential: Data loss Information disclosure – mixing data in multi-tenant scenarios 3-4 weeks of all-hands on deck + 24/7 We had iDNA trace (TTD / TTT)
  14. What happens when requests are cancelled? If 1st – close connection If last – remove it & and mark for closing If in middle – remove it & and mark for closing
  15. Bad things can happen – imagine you asked: “Does the data exist?” … data loss Multi-tenant scenarios: “Give me data about customer X” … data about Y
  16. Added logging (ETW) – reused buffers Old code – track down bad buffer management
  17. END: Feedback – Twitter, email, etc. What you liked vs. not What was rushed or hard to understand Help me do better job next time