SlideShare a Scribd company logo
1 of 22
War stories
from .NET team
.NET Core Summer event 2019 – Linz, AT
Karel Zikmund – @ziki_cz
Agenda
• Stories
• Investigations on .NET team
• Not just from me
• Lessons learned on the way
You won’t see any:
• Source code
• Debugger
Not on agenda
My First Serious Investigation
• Build lab for Windows component
• Build break 1x per week
• AccessViolation dialog hangs machine
• Repro:
• Once in ~50 runs
• Overnight run: 247 crashes out of 77,006 runs (0.3%)
mscorwks!UTSemReadWrite::UnlockRead+0xe
mscorwks!CMDSemReadWrite::~CMDSemReadWrite+0x14
mscorwks!RegMeta::DefineParam+0x19
cscomp!EMITTER::EmitParamProp
…
…
…
cscomp!CController::Compile
csc!main
My First Serious Investigation
My First Serious Investigation
• Does it by a chance reproduce on only one machine?
• Answer: How did you know?
• But why always the same callstack?
• Good question, no good answer … magic
• Lesson learned: Debugging HW errors is costly and hard
• Always ask: Does it repro on more than 1 machine?
Another MetaData story
MetaData format background:
• Basically database – rows and columns
• Example – TypeDef table:
• Indexes into tables/heaps are either 2B or 4B
• What happens if last TypeDef has no methods?
• MethodList = Number of methods + 1 = max + 1
• What happens if there is 0xffff methods?
Flags TypeName TypeNamespace Extends MethodList
(Public) “Foo” “Awesome.Story” … Method #10
(Private) “Bar” “Awesome.Story” … Method #11
Another MetaData story
• II.24.2.6 “#~ stream”
• If e is a simple index into a table with index i, it is stored using 2 bytes if table i has less than
2^16 rows, otherwise it is stored using 4 bytes.
• II.22.37 TypeDef : 0x02
• 21. If MethodList is non-null, it shall index a valid row in the MethodDef table, where valid
means 1 <= row <= rowcount+1 [ERROR]
• How do you fix it?
• “I’m on the fence whether we should (fix it), given it looks like people hit this about once in 17
years”
• https://github.com/dotnet/corefx/issues/29554
• Lesson learned: Not all bugs have to be fixed
TypeSystem – Collapsing interfaces
• Table of implemented interfaces:
class A : I, J {}
• With generics:
class C<T> : L<T> {}
class D<T> : C<T>, L<string> {}
class E : D<string>, I {}
0 1
I J
0 1 2
I J K
0 1
L<T> L<string>
0 1
L<string> I
0 1 2
L<string> L<string> I
class B : A, K {}
Fix:
Breaking changes – Intro
• Everyone wants fix for their bug
• But nobody wants to be broken
• Observation: 10% of fixes have unintended side-effects
• Extreme case: Perf improvement can break app
• How many customers?
• Lesson learned: Everything has risk of breaking someone
Breaking changes – Last build
• Finance app crashing – “last” build of Windows 8 on arm (Surface RT)
• Latent bug (introduced months ago)
• Bug triggered by:
1. Method in NGen image has to be across 8KB pages
2. GC has to be triggered at least twice when it’s on stack
• Unrelated change caused “unlucky” method order for:
• System.Net.Configuration.DefaultProxySectionInternal..ctor
• Lesson learned: Anything, really ANYTHING, has risk of breaking
Breaking changes – Huge impact
• Patch to .NET Framework broke certain tax SW
• Printing tax forms
• Update pushed few days before tax deadline in US
• Note: Printing was tested on both sides (Microsoft & tax SW
company)
• But only into file, not to printer
• Lessons learned: Be extra cautious around sensitive dates
Breaking changes – Below you
• RavenDB – blue screen after KB4487017 on .NET Core!
• dotnet/coreclr#22597
• PrefetchVirtualMemory
• Kernel memory
management bug
Networking – Security issue
• January: Researcher running ML models on Cosmos
• Suspicion about buffers – more logging
• March: Repro gone
• May: Similar report
• +2 weeks: It blows up (more teams & impact)
• All hands on-deck
• Small repro (20 min, then 1 min) … yay!
• TTD trace (iDNA / TTT) … bonus & life saver
Networking – Security issue
• Root-cause: HTTP pipelining under stress
• 13 years old bug (.NET 2.0)
Response 1
Request 1
Server
Response 1
Request 1
Server
Request 2
Response 2
Networking – Security issue
Request 1
Server
Request 2Request 3
Response 1Response 2
Networking – Security issue
Request 1
Server
Request 2Request 3
Response 1Response 2
Networking – Security issue
• We have workaround (disable pipelining) – perf impact
• Worked fix …
• Verifying fix …
• Repro fails after 4h 
• Same symptoms
• Repro sensitive to cloud network load (8-17)
• TTD (iDNA / TTT) does not work 
• Suspicion about buffers again
Networking – Security issue
• Bad buffer lifetime management – on sending side!
• 5 years old bug (.NET 4.5.2)
• Trigger found:
• Thanks to Skype team – 24h deployment of experiments
• Change in .NET 4.7.1
• Fix around the problematic area
• Making the opportunity window SMALLER!
• … counter-intuitive
• Code review – similar bug on receiving side (5 years old)
• Same symptoms as HTTP pipelining
Networking – Security issue
• Why so many customers/services hit it at once?
• Maybe Spectre & Meltdown fixes roll out?
• or just … magic
• Lesson learned: Weird coincidences can happen …
Optimizations
• Once upon a time, … there was a service in Microsoft
• List vs. array data structure perf
• Perspectives:
1. The data structure will have in practice 3-5 items
2. There 3 hops between servers for each request!!!
• Lesson learned: Avoid premature optimizations … at all cost
Lessons learned
• Always ask: Does it repro on more than 1 machine?
• Debugging HW bugs is costly
• Some bugs happen once in 17 years
• Spec bugs are hard to fix
• MetaData format bug
• Anything, really ANYTHING, has risk of breaking someone
• Innocent changes can trigger latent bugs elsewhere
• Impact may be huge – e.g. during tax season
• Always try to create small repro
• Make your and everyone’s life easier
• TTD (iDNA / TTT) is life saver
• Avoid premature optimizations … at all cost, save your time
• … sometimes there is just … magic
@ziki_cz
Thank you
• Feedback welcome
• Twitter DM, email, in-person, etc.
• Survey
• What you liked vs. not?
• Too rushed?
• Hard to understand?
• Boring?
• Didn’t meet your expectations?
@ziki_cz

More Related Content

Similar to .NET Core Summer event 2019 in Linz, AT - War stories from .NET team -- Karel Zikmund

Case Study of the Unexplained
Case Study of the UnexplainedCase Study of the Unexplained
Case Study of the Unexplainedshannomc
 
Patching Windows Executables with the Backdoor Factory | DerbyCon 2013
Patching Windows Executables with the Backdoor Factory | DerbyCon 2013Patching Windows Executables with the Backdoor Factory | DerbyCon 2013
Patching Windows Executables with the Backdoor Factory | DerbyCon 2013midnite_runr
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...smallerror
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...xlight
 
Fixing twitter
Fixing twitterFixing twitter
Fixing twitterRoger Xia
 
Vulnerability Inheritance in ICS (English)
Vulnerability Inheritance in ICS (English)Vulnerability Inheritance in ICS (English)
Vulnerability Inheritance in ICS (English)Digital Bond
 
Debugging multiplayer games
Debugging multiplayer gamesDebugging multiplayer games
Debugging multiplayer gamesMaciej Siniło
 
BSides MCR 2016: From CSV to CMD to qwerty
BSides MCR 2016: From CSV to CMD to qwertyBSides MCR 2016: From CSV to CMD to qwerty
BSides MCR 2016: From CSV to CMD to qwertyJerome Smith
 
Алексей Ященко и Ярослав Волощук "False simplicity of front-end applications"
Алексей Ященко и Ярослав Волощук "False simplicity of front-end applications"Алексей Ященко и Ярослав Волощук "False simplicity of front-end applications"
Алексей Ященко и Ярослав Волощук "False simplicity of front-end applications"Fwdays
 
Silicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Silicon Valley Code Camp 2015 - Advanced MongoDB - The SequelSilicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Silicon Valley Code Camp 2015 - Advanced MongoDB - The SequelDaniel Coupal
 
Asufe juniors-training session2
Asufe juniors-training session2Asufe juniors-training session2
Asufe juniors-training session2Omar Ahmed
 
TDD and Related Techniques for Non Developers (2012)
TDD and Related Techniques for Non Developers (2012)TDD and Related Techniques for Non Developers (2012)
TDD and Related Techniques for Non Developers (2012)Peter Kofler
 
Chirp 2010: Scaling Twitter
Chirp 2010: Scaling TwitterChirp 2010: Scaling Twitter
Chirp 2010: Scaling TwitterJohn Adams
 
Interactions complicate debugging
Interactions complicate debuggingInteractions complicate debugging
Interactions complicate debuggingSyed Zaid Irshad
 
Static Code Analysis and AutoLint
Static Code Analysis and AutoLintStatic Code Analysis and AutoLint
Static Code Analysis and AutoLintLeander Hasty
 
Mike Bartley - Innovations for Testing Parallel Software - EuroSTAR 2012
Mike Bartley - Innovations for Testing Parallel Software - EuroSTAR 2012Mike Bartley - Innovations for Testing Parallel Software - EuroSTAR 2012
Mike Bartley - Innovations for Testing Parallel Software - EuroSTAR 2012TEST Huddle
 

Similar to .NET Core Summer event 2019 in Linz, AT - War stories from .NET team -- Karel Zikmund (20)

Case Study of the Unexplained
Case Study of the UnexplainedCase Study of the Unexplained
Case Study of the Unexplained
 
Patching Windows Executables with the Backdoor Factory | DerbyCon 2013
Patching Windows Executables with the Backdoor Factory | DerbyCon 2013Patching Windows Executables with the Backdoor Factory | DerbyCon 2013
Patching Windows Executables with the Backdoor Factory | DerbyCon 2013
 
Surge2012
Surge2012Surge2012
Surge2012
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
 
Fixing twitter
Fixing twitterFixing twitter
Fixing twitter
 
Fixing_Twitter
Fixing_TwitterFixing_Twitter
Fixing_Twitter
 
Introduction to multicore .ppt
Introduction to multicore .pptIntroduction to multicore .ppt
Introduction to multicore .ppt
 
Vulnerability Inheritance in ICS (English)
Vulnerability Inheritance in ICS (English)Vulnerability Inheritance in ICS (English)
Vulnerability Inheritance in ICS (English)
 
Debugging multiplayer games
Debugging multiplayer gamesDebugging multiplayer games
Debugging multiplayer games
 
BSides MCR 2016: From CSV to CMD to qwerty
BSides MCR 2016: From CSV to CMD to qwertyBSides MCR 2016: From CSV to CMD to qwerty
BSides MCR 2016: From CSV to CMD to qwerty
 
Алексей Ященко и Ярослав Волощук "False simplicity of front-end applications"
Алексей Ященко и Ярослав Волощук "False simplicity of front-end applications"Алексей Ященко и Ярослав Волощук "False simplicity of front-end applications"
Алексей Ященко и Ярослав Волощук "False simplicity of front-end applications"
 
Silicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Silicon Valley Code Camp 2015 - Advanced MongoDB - The SequelSilicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
Silicon Valley Code Camp 2015 - Advanced MongoDB - The Sequel
 
Effective C++
Effective C++Effective C++
Effective C++
 
Asufe juniors-training session2
Asufe juniors-training session2Asufe juniors-training session2
Asufe juniors-training session2
 
TDD and Related Techniques for Non Developers (2012)
TDD and Related Techniques for Non Developers (2012)TDD and Related Techniques for Non Developers (2012)
TDD and Related Techniques for Non Developers (2012)
 
Chirp 2010: Scaling Twitter
Chirp 2010: Scaling TwitterChirp 2010: Scaling Twitter
Chirp 2010: Scaling Twitter
 
Interactions complicate debugging
Interactions complicate debuggingInteractions complicate debugging
Interactions complicate debugging
 
Static Code Analysis and AutoLint
Static Code Analysis and AutoLintStatic Code Analysis and AutoLint
Static Code Analysis and AutoLint
 
Mike Bartley - Innovations for Testing Parallel Software - EuroSTAR 2012
Mike Bartley - Innovations for Testing Parallel Software - EuroSTAR 2012Mike Bartley - Innovations for Testing Parallel Software - EuroSTAR 2012
Mike Bartley - Innovations for Testing Parallel Software - EuroSTAR 2012
 

More from Karel Zikmund

.NET Conf 2022 - Networking in .NET 7
.NET Conf 2022 - Networking in .NET 7.NET Conf 2022 - Networking in .NET 7
.NET Conf 2022 - Networking in .NET 7Karel Zikmund
 
NDC London 2020 - Challenges of Managing CoreFx Repo -- Karel Zikmund
NDC London 2020 - Challenges of Managing CoreFx Repo -- Karel ZikmundNDC London 2020 - Challenges of Managing CoreFx Repo -- Karel Zikmund
NDC London 2020 - Challenges of Managing CoreFx Repo -- Karel ZikmundKarel Zikmund
 
NDC Sydney 2019 - Async Demystified -- Karel Zikmund
NDC Sydney 2019 - Async Demystified -- Karel ZikmundNDC Sydney 2019 - Async Demystified -- Karel Zikmund
NDC Sydney 2019 - Async Demystified -- Karel ZikmundKarel Zikmund
 
WUG Days 2022 Brno - Networking in .NET 7.0 and YARP -- Karel Zikmund
WUG Days 2022 Brno - Networking in .NET 7.0 and YARP -- Karel ZikmundWUG Days 2022 Brno - Networking in .NET 7.0 and YARP -- Karel Zikmund
WUG Days 2022 Brno - Networking in .NET 7.0 and YARP -- Karel ZikmundKarel Zikmund
 
.NET Core Summer event 2019 in Vienna, AT - .NET 5 - Future of .NET on Mobile...
.NET Core Summer event 2019 in Vienna, AT - .NET 5 - Future of .NET on Mobile....NET Core Summer event 2019 in Vienna, AT - .NET 5 - Future of .NET on Mobile...
.NET Core Summer event 2019 in Vienna, AT - .NET 5 - Future of .NET on Mobile...Karel Zikmund
 
.NET Core Summer event 2019 in Brno, CZ - Async demystified -- Karel Zikmund
.NET Core Summer event 2019 in Brno, CZ - Async demystified -- Karel Zikmund.NET Core Summer event 2019 in Brno, CZ - Async demystified -- Karel Zikmund
.NET Core Summer event 2019 in Brno, CZ - Async demystified -- Karel ZikmundKarel Zikmund
 
.NET Core Summer event 2019 in Brno, CZ - .NET Core Networking stack and perf...
.NET Core Summer event 2019 in Brno, CZ - .NET Core Networking stack and perf....NET Core Summer event 2019 in Brno, CZ - .NET Core Networking stack and perf...
.NET Core Summer event 2019 in Brno, CZ - .NET Core Networking stack and perf...Karel Zikmund
 
DotNext 2017 in Moscow - Challenges of Managing CoreFX repo -- Karel Zikmund
DotNext 2017 in Moscow - Challenges of Managing CoreFX repo -- Karel ZikmundDotNext 2017 in Moscow - Challenges of Managing CoreFX repo -- Karel Zikmund
DotNext 2017 in Moscow - Challenges of Managing CoreFX repo -- Karel ZikmundKarel Zikmund
 
DotNext 2017 in Moscow - .NET Core Networking stack and Performance -- Karel ...
DotNext 2017 in Moscow - .NET Core Networking stack and Performance -- Karel ...DotNext 2017 in Moscow - .NET Core Networking stack and Performance -- Karel ...
DotNext 2017 in Moscow - .NET Core Networking stack and Performance -- Karel ...Karel Zikmund
 
.NET MeetUp Brno 2017 - Microsoft Engineering teams in Europe -- Karel Zikmund
.NET MeetUp Brno 2017 - Microsoft Engineering teams in Europe -- Karel Zikmund.NET MeetUp Brno 2017 - Microsoft Engineering teams in Europe -- Karel Zikmund
.NET MeetUp Brno 2017 - Microsoft Engineering teams in Europe -- Karel ZikmundKarel Zikmund
 
.NET MeetUp Brno 2017 - Xamarin .NET internals -- Marek Safar
.NET MeetUp Brno 2017 - Xamarin .NET internals -- Marek Safar.NET MeetUp Brno 2017 - Xamarin .NET internals -- Marek Safar
.NET MeetUp Brno 2017 - Xamarin .NET internals -- Marek SafarKarel Zikmund
 
.NET MeetUp Brno - Challenges of Managing CoreFX repo -- Karel Zikmund
.NET MeetUp Brno - Challenges of Managing CoreFX repo -- Karel Zikmund.NET MeetUp Brno - Challenges of Managing CoreFX repo -- Karel Zikmund
.NET MeetUp Brno - Challenges of Managing CoreFX repo -- Karel ZikmundKarel Zikmund
 
.NET Fringe 2017 - Challenges of Managing CoreFX repo -- Karel Zikmund
.NET Fringe 2017 - Challenges of Managing CoreFX repo -- Karel Zikmund.NET Fringe 2017 - Challenges of Managing CoreFX repo -- Karel Zikmund
.NET Fringe 2017 - Challenges of Managing CoreFX repo -- Karel ZikmundKarel Zikmund
 
.NET MeetUp Prague 2017 - Challenges of Managing CoreFX repo -- Karel Zikmund
.NET MeetUp Prague 2017 - Challenges of Managing CoreFX repo -- Karel Zikmund.NET MeetUp Prague 2017 - Challenges of Managing CoreFX repo -- Karel Zikmund
.NET MeetUp Prague 2017 - Challenges of Managing CoreFX repo -- Karel ZikmundKarel Zikmund
 
.NET MeetUp Prague 2017 - .NET Standard -- Karel Zikmund
.NET MeetUp Prague 2017 - .NET Standard -- Karel Zikmund.NET MeetUp Prague 2017 - .NET Standard -- Karel Zikmund
.NET MeetUp Prague 2017 - .NET Standard -- Karel ZikmundKarel Zikmund
 
.NET MeetUp Amsterdam 2017 - Challenges of Managing CoreFX repo -- Karel Zikmund
.NET MeetUp Amsterdam 2017 - Challenges of Managing CoreFX repo -- Karel Zikmund.NET MeetUp Amsterdam 2017 - Challenges of Managing CoreFX repo -- Karel Zikmund
.NET MeetUp Amsterdam 2017 - Challenges of Managing CoreFX repo -- Karel ZikmundKarel Zikmund
 
.NET MeetUp Amsterdam 2017 - .NET Standard -- Karel Zikmund
.NET MeetUp Amsterdam 2017 - .NET Standard -- Karel Zikmund.NET MeetUp Amsterdam 2017 - .NET Standard -- Karel Zikmund
.NET MeetUp Amsterdam 2017 - .NET Standard -- Karel ZikmundKarel Zikmund
 

More from Karel Zikmund (17)

.NET Conf 2022 - Networking in .NET 7
.NET Conf 2022 - Networking in .NET 7.NET Conf 2022 - Networking in .NET 7
.NET Conf 2022 - Networking in .NET 7
 
NDC London 2020 - Challenges of Managing CoreFx Repo -- Karel Zikmund
NDC London 2020 - Challenges of Managing CoreFx Repo -- Karel ZikmundNDC London 2020 - Challenges of Managing CoreFx Repo -- Karel Zikmund
NDC London 2020 - Challenges of Managing CoreFx Repo -- Karel Zikmund
 
NDC Sydney 2019 - Async Demystified -- Karel Zikmund
NDC Sydney 2019 - Async Demystified -- Karel ZikmundNDC Sydney 2019 - Async Demystified -- Karel Zikmund
NDC Sydney 2019 - Async Demystified -- Karel Zikmund
 
WUG Days 2022 Brno - Networking in .NET 7.0 and YARP -- Karel Zikmund
WUG Days 2022 Brno - Networking in .NET 7.0 and YARP -- Karel ZikmundWUG Days 2022 Brno - Networking in .NET 7.0 and YARP -- Karel Zikmund
WUG Days 2022 Brno - Networking in .NET 7.0 and YARP -- Karel Zikmund
 
.NET Core Summer event 2019 in Vienna, AT - .NET 5 - Future of .NET on Mobile...
.NET Core Summer event 2019 in Vienna, AT - .NET 5 - Future of .NET on Mobile....NET Core Summer event 2019 in Vienna, AT - .NET 5 - Future of .NET on Mobile...
.NET Core Summer event 2019 in Vienna, AT - .NET 5 - Future of .NET on Mobile...
 
.NET Core Summer event 2019 in Brno, CZ - Async demystified -- Karel Zikmund
.NET Core Summer event 2019 in Brno, CZ - Async demystified -- Karel Zikmund.NET Core Summer event 2019 in Brno, CZ - Async demystified -- Karel Zikmund
.NET Core Summer event 2019 in Brno, CZ - Async demystified -- Karel Zikmund
 
.NET Core Summer event 2019 in Brno, CZ - .NET Core Networking stack and perf...
.NET Core Summer event 2019 in Brno, CZ - .NET Core Networking stack and perf....NET Core Summer event 2019 in Brno, CZ - .NET Core Networking stack and perf...
.NET Core Summer event 2019 in Brno, CZ - .NET Core Networking stack and perf...
 
DotNext 2017 in Moscow - Challenges of Managing CoreFX repo -- Karel Zikmund
DotNext 2017 in Moscow - Challenges of Managing CoreFX repo -- Karel ZikmundDotNext 2017 in Moscow - Challenges of Managing CoreFX repo -- Karel Zikmund
DotNext 2017 in Moscow - Challenges of Managing CoreFX repo -- Karel Zikmund
 
DotNext 2017 in Moscow - .NET Core Networking stack and Performance -- Karel ...
DotNext 2017 in Moscow - .NET Core Networking stack and Performance -- Karel ...DotNext 2017 in Moscow - .NET Core Networking stack and Performance -- Karel ...
DotNext 2017 in Moscow - .NET Core Networking stack and Performance -- Karel ...
 
.NET MeetUp Brno 2017 - Microsoft Engineering teams in Europe -- Karel Zikmund
.NET MeetUp Brno 2017 - Microsoft Engineering teams in Europe -- Karel Zikmund.NET MeetUp Brno 2017 - Microsoft Engineering teams in Europe -- Karel Zikmund
.NET MeetUp Brno 2017 - Microsoft Engineering teams in Europe -- Karel Zikmund
 
.NET MeetUp Brno 2017 - Xamarin .NET internals -- Marek Safar
.NET MeetUp Brno 2017 - Xamarin .NET internals -- Marek Safar.NET MeetUp Brno 2017 - Xamarin .NET internals -- Marek Safar
.NET MeetUp Brno 2017 - Xamarin .NET internals -- Marek Safar
 
.NET MeetUp Brno - Challenges of Managing CoreFX repo -- Karel Zikmund
.NET MeetUp Brno - Challenges of Managing CoreFX repo -- Karel Zikmund.NET MeetUp Brno - Challenges of Managing CoreFX repo -- Karel Zikmund
.NET MeetUp Brno - Challenges of Managing CoreFX repo -- Karel Zikmund
 
.NET Fringe 2017 - Challenges of Managing CoreFX repo -- Karel Zikmund
.NET Fringe 2017 - Challenges of Managing CoreFX repo -- Karel Zikmund.NET Fringe 2017 - Challenges of Managing CoreFX repo -- Karel Zikmund
.NET Fringe 2017 - Challenges of Managing CoreFX repo -- Karel Zikmund
 
.NET MeetUp Prague 2017 - Challenges of Managing CoreFX repo -- Karel Zikmund
.NET MeetUp Prague 2017 - Challenges of Managing CoreFX repo -- Karel Zikmund.NET MeetUp Prague 2017 - Challenges of Managing CoreFX repo -- Karel Zikmund
.NET MeetUp Prague 2017 - Challenges of Managing CoreFX repo -- Karel Zikmund
 
.NET MeetUp Prague 2017 - .NET Standard -- Karel Zikmund
.NET MeetUp Prague 2017 - .NET Standard -- Karel Zikmund.NET MeetUp Prague 2017 - .NET Standard -- Karel Zikmund
.NET MeetUp Prague 2017 - .NET Standard -- Karel Zikmund
 
.NET MeetUp Amsterdam 2017 - Challenges of Managing CoreFX repo -- Karel Zikmund
.NET MeetUp Amsterdam 2017 - Challenges of Managing CoreFX repo -- Karel Zikmund.NET MeetUp Amsterdam 2017 - Challenges of Managing CoreFX repo -- Karel Zikmund
.NET MeetUp Amsterdam 2017 - Challenges of Managing CoreFX repo -- Karel Zikmund
 
.NET MeetUp Amsterdam 2017 - .NET Standard -- Karel Zikmund
.NET MeetUp Amsterdam 2017 - .NET Standard -- Karel Zikmund.NET MeetUp Amsterdam 2017 - .NET Standard -- Karel Zikmund
.NET MeetUp Amsterdam 2017 - .NET Standard -- Karel Zikmund
 

Recently uploaded

Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf31events.com
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalLionel Briand
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Cizo Technology Services
 
Patterns for automating API delivery. API conference
Patterns for automating API delivery. API conferencePatterns for automating API delivery. API conference
Patterns for automating API delivery. API conferencessuser9e7c64
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringHironori Washizaki
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprisepreethippts
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...OnePlan Solutions
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanyChristoph Pohl
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Matt Ray
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)jennyeacort
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfMarharyta Nedzelska
 
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...Akihiro Suda
 
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdfInnovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdfYashikaSharma391629
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLionel Briand
 
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdfExploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdfkalichargn70th171
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...Technogeeks
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odishasmiwainfosol
 

Recently uploaded (20)

Advantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your BusinessAdvantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your Business
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive Goal
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
 
Patterns for automating API delivery. API conference
Patterns for automating API delivery. API conferencePatterns for automating API delivery. API conference
Patterns for automating API delivery. API conference
 
Machine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their EngineeringMachine Learning Software Engineering Patterns and Their Engineering
Machine Learning Software Engineering Patterns and Their Engineering
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprise
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdf
 
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
 
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdfInnovate and Collaborate- Harnessing the Power of Open Source Software.pdf
Innovate and Collaborate- Harnessing the Power of Open Source Software.pdf
 
Large Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and RepairLarge Language Models for Test Case Evolution and Repair
Large Language Models for Test Case Evolution and Repair
 
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdfExploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
Exploring Selenium_Appium Frameworks for Seamless Integration with HeadSpin.pdf
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
 

.NET Core Summer event 2019 in Linz, AT - War stories from .NET team -- Karel Zikmund

  • 1. War stories from .NET team .NET Core Summer event 2019 – Linz, AT Karel Zikmund – @ziki_cz
  • 2. Agenda • Stories • Investigations on .NET team • Not just from me • Lessons learned on the way You won’t see any: • Source code • Debugger Not on agenda
  • 3. My First Serious Investigation • Build lab for Windows component • Build break 1x per week • AccessViolation dialog hangs machine • Repro: • Once in ~50 runs • Overnight run: 247 crashes out of 77,006 runs (0.3%)
  • 5. My First Serious Investigation • Does it by a chance reproduce on only one machine? • Answer: How did you know? • But why always the same callstack? • Good question, no good answer … magic • Lesson learned: Debugging HW errors is costly and hard • Always ask: Does it repro on more than 1 machine?
  • 6. Another MetaData story MetaData format background: • Basically database – rows and columns • Example – TypeDef table: • Indexes into tables/heaps are either 2B or 4B • What happens if last TypeDef has no methods? • MethodList = Number of methods + 1 = max + 1 • What happens if there is 0xffff methods? Flags TypeName TypeNamespace Extends MethodList (Public) “Foo” “Awesome.Story” … Method #10 (Private) “Bar” “Awesome.Story” … Method #11
  • 7. Another MetaData story • II.24.2.6 “#~ stream” • If e is a simple index into a table with index i, it is stored using 2 bytes if table i has less than 2^16 rows, otherwise it is stored using 4 bytes. • II.22.37 TypeDef : 0x02 • 21. If MethodList is non-null, it shall index a valid row in the MethodDef table, where valid means 1 <= row <= rowcount+1 [ERROR] • How do you fix it? • “I’m on the fence whether we should (fix it), given it looks like people hit this about once in 17 years” • https://github.com/dotnet/corefx/issues/29554 • Lesson learned: Not all bugs have to be fixed
  • 8. TypeSystem – Collapsing interfaces • Table of implemented interfaces: class A : I, J {} • With generics: class C<T> : L<T> {} class D<T> : C<T>, L<string> {} class E : D<string>, I {} 0 1 I J 0 1 2 I J K 0 1 L<T> L<string> 0 1 L<string> I 0 1 2 L<string> L<string> I class B : A, K {} Fix:
  • 9. Breaking changes – Intro • Everyone wants fix for their bug • But nobody wants to be broken • Observation: 10% of fixes have unintended side-effects • Extreme case: Perf improvement can break app • How many customers? • Lesson learned: Everything has risk of breaking someone
  • 10. Breaking changes – Last build • Finance app crashing – “last” build of Windows 8 on arm (Surface RT) • Latent bug (introduced months ago) • Bug triggered by: 1. Method in NGen image has to be across 8KB pages 2. GC has to be triggered at least twice when it’s on stack • Unrelated change caused “unlucky” method order for: • System.Net.Configuration.DefaultProxySectionInternal..ctor • Lesson learned: Anything, really ANYTHING, has risk of breaking
  • 11. Breaking changes – Huge impact • Patch to .NET Framework broke certain tax SW • Printing tax forms • Update pushed few days before tax deadline in US • Note: Printing was tested on both sides (Microsoft & tax SW company) • But only into file, not to printer • Lessons learned: Be extra cautious around sensitive dates
  • 12. Breaking changes – Below you • RavenDB – blue screen after KB4487017 on .NET Core! • dotnet/coreclr#22597 • PrefetchVirtualMemory • Kernel memory management bug
  • 13. Networking – Security issue • January: Researcher running ML models on Cosmos • Suspicion about buffers – more logging • March: Repro gone • May: Similar report • +2 weeks: It blows up (more teams & impact) • All hands on-deck • Small repro (20 min, then 1 min) … yay! • TTD trace (iDNA / TTT) … bonus & life saver
  • 14. Networking – Security issue • Root-cause: HTTP pipelining under stress • 13 years old bug (.NET 2.0) Response 1 Request 1 Server Response 1 Request 1 Server Request 2 Response 2
  • 15. Networking – Security issue Request 1 Server Request 2Request 3 Response 1Response 2
  • 16. Networking – Security issue Request 1 Server Request 2Request 3 Response 1Response 2
  • 17. Networking – Security issue • We have workaround (disable pipelining) – perf impact • Worked fix … • Verifying fix … • Repro fails after 4h  • Same symptoms • Repro sensitive to cloud network load (8-17) • TTD (iDNA / TTT) does not work  • Suspicion about buffers again
  • 18. Networking – Security issue • Bad buffer lifetime management – on sending side! • 5 years old bug (.NET 4.5.2) • Trigger found: • Thanks to Skype team – 24h deployment of experiments • Change in .NET 4.7.1 • Fix around the problematic area • Making the opportunity window SMALLER! • … counter-intuitive • Code review – similar bug on receiving side (5 years old) • Same symptoms as HTTP pipelining
  • 19. Networking – Security issue • Why so many customers/services hit it at once? • Maybe Spectre & Meltdown fixes roll out? • or just … magic • Lesson learned: Weird coincidences can happen …
  • 20. Optimizations • Once upon a time, … there was a service in Microsoft • List vs. array data structure perf • Perspectives: 1. The data structure will have in practice 3-5 items 2. There 3 hops between servers for each request!!! • Lesson learned: Avoid premature optimizations … at all cost
  • 21. Lessons learned • Always ask: Does it repro on more than 1 machine? • Debugging HW bugs is costly • Some bugs happen once in 17 years • Spec bugs are hard to fix • MetaData format bug • Anything, really ANYTHING, has risk of breaking someone • Innocent changes can trigger latent bugs elsewhere • Impact may be huge – e.g. during tax season • Always try to create small repro • Make your and everyone’s life easier • TTD (iDNA / TTT) is life saver • Avoid premature optimizations … at all cost, save your time • … sometimes there is just … magic @ziki_cz
  • 22. Thank you • Feedback welcome • Twitter DM, email, in-person, etc. • Survey • What you liked vs. not? • Too rushed? • Hard to understand? • Boring? • Didn’t meet your expectations? @ziki_cz

Editor's Notes

  1. Quickly about me: .NET team for almost 14 years Started as junior / out of college on Runtime – C++, pieces like Metadata, TypeSystem, Assembly Loader Later on moved to manager role Then moved to BCL (Base Class Libraries) – Networking area mainly (HttpClient) … working in open-source (.NET Core) Community manager of dotnet/corefx repo
  2. Lessons learned – maybe useful to you Maybe just helps you understand what is happening on the other side / below you I already had few people confirm they hit some/all situations Were able to identify with problems and recommendations
  3. 2006 January – 3 months in MS Large code base, dozens of machines, productivity impact on larger team Crash – “hang dialog” with AV Repro – great Getting heap dumps We get to see callstack … but before that, some quotes
  4. Simplified callstack for readability AV in MetaData emitting – defining a parameter Basically stack corruption (dangerous) Proper RW lock Who corrupts memory? …
  5. Costly and hard … and requires quite some expertise Variants: Different machine setup? … driver bugs Extreme from Maoni: Real HW?
  6. 1 year old story – 2018 May First background on MetaData Compressed indexes = just schema which says 2B, 4B … variable between files, but static/stable and given per file MethodList = Start of list of methods, INCLUSIVE
  7. How do you fix that? … You don’t … spec bug / format bug Changing rules means rewriting & recompiling all tools (CCI and command line tools like ildasm, or UI Reflector, ILSpy, Visual Studio, debuggers, profilers, …) Compensate? Rearranging fields/methods/params in a way the last one does not need the +1. Nasty Emitting fake type/method with field/method/param to push row count to 2^16. Also nasty Using 0 as valid value? Readers will be surprised, maybe other bugs?
  8. 2009/4 story VirtualStubDispatch uses indexes of interfaces to call virtual methods – code in D<T>, casting to K<string> will use index 1 for interface and method index. But if instance of E is passed, method on I (interface index 1) SECURITY issue! Missing feature – implementing in security patch … risk of breaking changes Proper description complicates spec a lot Kind of problem when spec had 1 reference implementation – described the implementation Bug since introduction of generics, many smart people looked at it, yet missed it
  9. Read slides
  10. OEM getting builds 2 days Paranoia
  11. Sensitive dates like tax date, shopping season? (December) … online stores usually have stop on any changes
  12. Federico Lois February case Blue screen – not something CoreCLR should be able to cause Used PrefetchVirtualMemory to optimize perf – rarely used When you’re on cutting edge, pushing the limits, you should be prepared for anything
  13. Last July (2018) Story starts 8 months earlier in December 2017 Is it server or client problem? … wireshark traces Around Feb, we know it is client - .NET or Windows March – repro is gone (they upgraded cluster) (fast forward 2 months) May another email thread – similar symptoms Back and forth Heated Realize it is 2 different products on the thread And then couple of more start coming in span of 2 weeks Impact on one customer is huge Potential: Data loss Information disclosure – mixing data in multi-tenant scenarios 3-4 weeks of all-hands on deck + 24/7 We had iDNA trace (TTD / TTT)
  14. What happens when requests are cancelled? If 1st – close connection If last – remove it & and mark for closing If in middle – remove it & and mark for closing
  15. Bad things can happen – imagine you asked: “Does the data exist?” … data loss Multi-tenant scenarios: “Give me data about customer X” … data about Y
  16. Added logging (ETW) – reused buffers Old code – track down bad buffer management
  17. Something like SharePoint Online, Bing, O365 OneNote – where response to customer matters Caring about perf early (with measurements!) != premature optimization
  18. Help me do better job next time