SlideShare a Scribd company logo
1 of 25
Architected for Performance
PCIe Hot-Plug and Error Handling for NVMe
2019 NVMe™ Annual Members Meeting and Developer Day
March 19, 2019
Prepared by:
Austin Bolen, Server Storage Technologist, Dell EMC
Curtis Ballard, Storage Technologist, HPE
Joe Cowan, Senior Systems Architect, HPE
Agenda
• The Importance of Hot-Plug and Error Handling for NVMe™
• Challenges with NVMe Hot-Plug and Error Handling
• Solutions to NVMe Hot-Plug and Error Handling Challenges
• Questions
The Importance of Hot-Plug and
Error Handling for NVMe™
The Importance of Hot-Plug (RASM)
* https://software.intel.com/en-us/articles/rasm-a-primer-for-isv-applications-engineers
Better RASM = Reduced TCO
Customer Requirements:
• Surprise/Async hot-plug
- No prepare-to-remove
• Parity with SAS/SATA or better
• Handle all PCIe errors, not just
errors due to surprise/async
removal
The Importance of Hot-Plug (Reliability)
* https://software.intel.com/en-us/articles/rasm-a-primer-for-isv-applications-engineers
Reliability:
Device reliability is key, however:
• Small failure rates exacerbated at
scale
• Hundreds or thousands of
systems per datacenter
• Many drives per system
• NAND wears out
Failures will occur HA solutions will
require Hot-Plug
The Importance of Hot-Plug (Manageability)
* https://software.intel.com/en-us/articles/rasm-a-primer-for-isv-applications-engineers
Manageability:
• Monitoring and reporting of
device failure or predicted failure
• Inventorying for re-provisioning of
storage
The Importance of Hot-Plug (Serviceability)
* https://software.intel.com/en-us/articles/rasm-a-primer-for-isv-applications-engineers
Serviceability:
• Async hot-plug is required for
SAS/SATA equivalent serviceability
for NVMe drives
• Async/surprise removal eliminates
the need for:
• Orderly removal software
• A technician with physical
access to replace drives may
not have access to these
software interfaces
• Costly orderly removal hardware
(attention buttons, power controllers,
etc.)
The Importance of Hot-Plug (Availability)
* https://software.intel.com/en-us/articles/rasm-a-primer-for-isv-applications-engineers
Availability:
• Hot-plug increases availability by
avoiding costly downtime due to:
• Replacing failed drives
• Re-provisioning storage
Challenges with NVMe™ Hot-Plug and
Error Handling
NVMe™ Hot-Plug/Error Handling – Why is it such a heavy lift?
Because it’s an ecosystem issue!
• NVMe Drive
• Platform
• Hardware
• Firmware
• BMC
• PCIe Root Port/Switch
• Operating System
• NVMe Driver
• PCIe Driver
• ACPI Driver
• Applications
Each player historically looking at
their own piece. But who is looking at
the whole picture?
It’s a
rope!
It’s a
wall!
It’s a
spear!
It’s a
tree!
It’s a
fan!
It’s a
snake!
Hot-Plug Storage – A High-Level Comparison
Host Software (Operating System, Drivers,
Applications, UEFI/BIOS)
SAS
Controller
SATA
Controller
NVMe
Controller
SAS
Drive
SATA
Drive
NVMe
Drive
Hot-Plug Barrier
Processor
Hardware above the
barrier is not hot pluggable
Hardware below the
barrier is hot pluggable
SAS
Bus
SATA
Bus
PCIe
Bus
• SAS/SATA drivers bind to
controllers above the hot
plug barrier
• Protocol conversion
provides software isolation
• Physical layer conversion
provides hardware isolation
• NVMe™ drivers bind to
controllers below the hot plug
barrier
• No protocol translation == No
software isolation
• No physical layer conversion
== No hardware isolation
The PCIe Hot-Plug Eras
(Where we’ve been, Where we are)
• The Standard Hot-Plug Controller (SHPC) Era
– Timeframe: PCI/PCI-X, Early PCIe
– Complex (196 page specification)
– Orderly insertion/removal only
– Async insert/removal likely to crash system
– Additional hardware (expensive)
– Power Controllers
– Power/Attention Indicators/Buttons
– Mechanical Retention Latch (MRL)
• The Hot-Plug Surprise (HPS) Era
– Timeframe: Starting with new form factors like PCIe storage and Thunderbolt to present day
– New form factors demand a simplified user experience that eliminates orderly removal overhead
– For NVMe, mimic SAS/SATA hot-plug model
– Surprise insertion/removal
– Surprise removal not supported by most OSes
– Software or hardware initiated orderly removal typically required
Hot-Plug Issues Persist After SHPC and HPS
• System crashes are still possible
• Errors if orderly removal process not followed with SHPC
• Synthesized all 1’s data during errors - not always handled correctly by software
• No strict model for interaction of stack components - leads to race conditions causing
crashes and deadlocks
• Other issues
• Timely detection of removal and insertion (detection while in low power state)
• Mechanical insert/remove issues (slow insert, angled insert, etc.)
• Issues often require changes outside the component under test (OS, switch, etc.)
• SHPC and HPS aren’t robust enough for complex use cases
Solutions to NVMe™ Hot-Plug and
Error Handling Challenges
Key Design Tenets
• Create a hot-plug and error handling/recovery “toolbox”
- Allow for flexibility in solution
- Systems, Form Factors, OSes all have different needs
- Support all PCIe use cases, not just NVMe
- Tools to handle unforeseen issues
• Fix known issues
• Leverage and reach parity with existing solutions
- SAS/SATA model
 Eliminate need for orderly insertion/removal
- Proprietary PCIe error recovery models
• Multi-phase approach with incremental improvements
• Error recovery mechanisms must be extensible to all PCIe errors
- Surprise/async removal errors
- Minimize the chance of issue due to accidental removal of wrong device
- Errors unrelated to hot-plug
Hot-Plug
&
Error Handling
Hot-Plug &
Error Handling
Key Design Tenets
• Hooks for time-to-market
• System hardware/firmware changes should be
sufficient for:
• New system designs and form factors
• Fixing defects/unforeseen issues
• Avoid/minimize need for:
• Future OS changes
• Future PCIe Root Port/Switch changes
Industry Alignment
• Alignment/Feedback from OEMs
• Dell EMC
• HPE
• Lenovo
• Oracle
• Alignment/Feedback from PCIe Root Port and
Switch Vendors
• AMD
• Broadcom
• Intel
• Microsemi
• OSVs
• Microsoft
• VMWare
• Linux distributors/kernel developers
ECN Sponsors Standards Bodies Specifications
Standards-Based Solution
Proposal Standard Stage Description
System Firmware Intermediary (SFI) PCIe Base Spec Ratified. ECN Published
to PCI-SIG Website.
Adds system firmware layer between OS and
PCIe devices for hot-plug.
Containment Error Recovery (CER) PCIe Base Spec Ratified. ECN Published
to PCI-SIG Website.
Defines software/firmware PCIe error
recovery model built on top of Downstream
Port Containment hardware.
ACPI Spec Released In ACPI 6.3
PCI Firmware
Specification
Ratified. ECN Published
to PCI-SIG Website.
Hot-Plug Extensions (_HPX) ACPI Spec Released In ACPI 6.3 Allows system firmware to tell OS how to set
PCIe Configuration Space for hot-inserted
PCIe devices.
PCI Firmware
Specification
Member Review
Complete. Should be
ratified shortly.
CER Era
Host SW/FW (Operating System,
Drivers, Applications, UEFI/BIOS)
PCIe Root
Port w/ DPC
NVMe
Drive
Processor
PCIe
Bus
Error
PCIe Root
Port w/ DPC
Switch
Upstream
Port
Switch
Downstream
Port w/ DPC
Switch
Downstream
Port w/ DPC
NVMe
Drive
NVMe
Drive
Async Removal or
other errors detected
by the Root Port or
Switch
DPC in Root Port or
Switch contains errors
by forcing/keeping
PCIe link down
1
2
3
4
5
The Root Port or
Switch notifies FW or
host OS
FW and/or host OS
entities attempt to
recover from the error
PCIe
Bus
PCIe
Bus
Async
Remove
Host OS releases
DPC and restarts
device if present and
recovered
PCIe
Switch
• The Containment Error Recovery
(CER) Era
– Timeframe: Transitioning now
– Replaces HPS
– The term “async” replaces “surprise” (i.e.
async removal/insertion instead of surprise
insertion/removal) in PCIe specs
– CER software/firmware model can be used
to recover from many PCIe errors – not
just errors due to async removal
– Utilizes Downstream Port Containment
(DPC) hardware in PCIe root ports and
switch downstream ports to contain errors
including async remove related errors
– Two CER modes: Native OS Controlled
and Firmware First
› Firmware First mode requires ACPI changes
in OS and BIOS/UEFI
– Based on tried-and-true proprietary models
System Firmware Intermediary Era
Host Software (Operating System,
Drivers, Applications, UEFI/BIOS)
SAS
Controller
SATA
Controller
NVMe
Controller
SAS
Drive
SATA
Drive
NVMe
Drive
Hot-Plug Barrier
Processor
Hardware above the
barrier is not hot pluggable
Hardware below the
barrier is hot pluggable
SAS
Bus
SATA
Bus
PCIe
Bus
System Firmware
Intermediary (SFI)
• SFI isolates PCIe hot-plug
events from the OS, drivers,
and applications for hot-plug -
does not alter data path.
• Hardware isolation in PCIe
Root Ports and Switch
Downstream Ports
• Provides options to invoke
system firmware (BIOS, UEFI,
BMC, etc.) for hot-plug events
• Particularly useful for complex
out-of-band (independent of
host OS) platform config of
hot-inserted devices (e.g.,
unlocking TCG drives or
device authentication)
• The System Firmware Intermediary (SFI) Era
– Timeframe: Silicon support will arrive over next several years
– Does not replace DPC/CER - works alongside DPC/CER
– Adds hardware/firmware layer between OS and devices for hot-plug
Hot-Plug Parameter Extensions (_HPX)
• _HPX exists across all hot-plug eras
• _HPX allows system firmware to provide system-specific PCIe config
space settings to OS
– Not just for hot-inserted device; also used if device is reset at runtime
• New _HPX Setting Record (Type 3) defined in ACPI specification
– Previous setting records only worked for pre-defined registers
– New registers required spec update an OS change
– New Type 3 record can specify any register with offset relative to offset 0h of:
– The start of configuration space
– A Capability Structure
– An Extended Capability Structure
– A Vendor-Specific Extended Capability
– A Designated Vendor-Specific Extended Capability
• Handle different revisions of capability structures
– Apply changes to any revision of the capability structure
– Apply changes to a specific revision of the capability structure
– Apply changes to capability structures with revision greater than or equal to
the specified revision
• Supports simple if-then-else conditional grammar
– E.g., to set PCIe configuration space registers to preferred value based on
device capability
• Lightweight alternative to SFI for simple config space settings
Example Pseudocode – Set Completion Timeout
(CTO) Value based on device’s Completion Timeout
Ranges Supported:
If CTO Range B supported then
Set CTO Value to 65 ms to 210 ms
Else if CTO Range C supported then
Set CTO Value to 260 ms to 900 ms
Else if CTO Range D supported then
Set CTO Value to 4 s to 13 s
Else
Set CTO Disable
Next Steps
• PCIe Root Ports and Switches
- Add support for DPC/eDPC
- Add support for SFI
• Operating Systems and OEMs
- Add support for async removal in HPS mode as a stop-gap until CER can be fully implemented
- Add support for Containment Error Recovery Model defined by PCI-SIG
 Native OS controlled and Firmware First models
- Review/contribute to open source effort
 DPC Containment Error Recovery patches submitted to Linux kernel
o Also called Error Disconnect Recover (EDR) after the ACPI method used in DPC CER model
 _HPX patches submitted to Linux kernel
• Connectors/Form Factors - Design for async hot-plug
- Prevent damage to I/O pins on hot-insert typically by making ground pins longer than other pins
- Limit current surge on hot-insert
 Pre-charge pin for each voltage rail which is second to mate or
 Soft start/hot-plug circuits for each rail
- Physical presence mandatory
 Should be shortest pin so platform knows when device is fully inserted
 May need a presence pin on each end of connector unless you can guarantee connector cannot mate at an angle
- Make sure pins can’t cross-connect on insert
- Consider issues with pin wipe b/c higher frequencies demand shorter pin lengths making it difficult to support pins of different length
- Form factors should allow for stable insert/removal
- Form factors should allow adequate mount points
Resources
Resource Link
ACPI 6.3: Add “Error Disconnect Recover”
mechanism for DPC and new Hot-Plug Parameter
Extensions (_HPX) Setting Record (Type 3)
https://uefi.org/sites/default/files/resources/ACPI_6_3_final_Jan30.pdf
(DPC EDR) https://mantis.uefi.org/mantis/view.php?id=1939*
(_HPX) https://mantis.uefi.org/mantis/view.php?id=1922*
PCI Express Base Specification Revision 4.0
Version 1.0
https://members.pcisig.com/wg/PCI-SIG/document/10912?downloadRevision=active*
PCIe Base Spec. ECN: Async Hot-Plug Updates
(DPC/CER, SFI)
https://members.pcisig.com/wg/PCI-SIG/document/12400*
PCI Firmware Spec. ECN: Downstream Port
Containment related Enhancements
https://members.pcisig.com/wg/PCI-SIG/document/12614*
PCI Firmware Spec. ECN: _HPX and PCIe
Completion Timeout related _OSC Enhancements
https://members.pcisig.com/wg/PCI-SIG/document/12712*
Dell EMC Tech Note: NVMe Hot-Plug Challenges
and Industry Adoption
https://downloads.dell.com/manuals/common/dfd_-_nvme_hot-
plug_challenges_and_industry_adoption.pdf
Implementing Hot-Plug in NVMe Storage Systems https://www.flashmemorysummit.com/English/Collaterals/Proceedings/2018/20180808_NVME-
201-2_Yung.pdf
The Modernization of PCIe Hot-Plug in Linux https://lwn.net/Articles/767885/
* Requires member access to the relevant standards body website
Linux Enablement
Feature Patch Link
DPC
Containment
Error Recovery
(CER)
Add Error Disconnect Recover (EDR) support https://patchwork.kernel.org/cover/10833723/
Add _OSC based negotiation support for DPC https://patchwork.kernel.org/patch/10833717/
Add Error Disconnect Recover (EDR) ACPI notifier support https://patchwork.kernel.org/patch/10833725/
Add Error Disconnect Recover (EDR) support https://patchwork.kernel.org/patch/10833721/
Hot-Plug
Parameter
Extensions
(HPX)
Implement support for _HPX Type 3 tables https://patchwork.kernel.org/cover/10843875/
Do not export pci_get_hp_params() https://patchwork.kernel.org/patch/10843877/
Remove the need for 'struct hotplug_params’ https://patchwork.kernel.org/patch/10843887/
Implement Type 3 _HPX record https://patchwork.kernel.org/patch/10843883/
Advertise HPX type 3 support via _OSC https://patchwork.kernel.org/patch/10855469/
Architected for Performance
Questions?

More Related Content

Similar to 04_Bolen-and-Ballard_PCIe-Hot-Plug-and-Error-Handling-for-NVMe_Final-3.13-apb (1).pptx

PCI Express* based Storage: Data Center NVM Express* Platform Topologies
PCI Express* based Storage: Data Center NVM Express* Platform TopologiesPCI Express* based Storage: Data Center NVM Express* Platform Topologies
PCI Express* based Storage: Data Center NVM Express* Platform TopologiesOdinot Stanislas
 
The State of CXL-related Activities within OCP
The State of CXL-related Activities within OCPThe State of CXL-related Activities within OCP
The State of CXL-related Activities within OCPMemory Fabric Forum
 
Accelerating Cassandra Workloads on Ceph with All-Flash PCIE SSDS
Accelerating Cassandra Workloads on Ceph with All-Flash PCIE SSDSAccelerating Cassandra Workloads on Ceph with All-Flash PCIE SSDS
Accelerating Cassandra Workloads on Ceph with All-Flash PCIE SSDSCeph Community
 
NVMe_Infrastructure_final1.pdf
NVMe_Infrastructure_final1.pdfNVMe_Infrastructure_final1.pdf
NVMe_Infrastructure_final1.pdfIrfanBroadband
 
S104878 nvme-revolution-jburg-v1809b
S104878 nvme-revolution-jburg-v1809bS104878 nvme-revolution-jburg-v1809b
S104878 nvme-revolution-jburg-v1809bTony Pearson
 
Hyper-V Best Practices & Tips and Tricks
Hyper-V Best Practices & Tips and TricksHyper-V Best Practices & Tips and Tricks
Hyper-V Best Practices & Tips and TricksAmit Gatenyo
 
EMC Symmetrix VMAX: An Introduction to Enterprise Storage: Brian Boyd, Varrow...
EMC Symmetrix VMAX: An Introduction to Enterprise Storage: Brian Boyd, Varrow...EMC Symmetrix VMAX: An Introduction to Enterprise Storage: Brian Boyd, Varrow...
EMC Symmetrix VMAX: An Introduction to Enterprise Storage: Brian Boyd, Varrow...Brian Boyd
 
Heterogeneous Computing : The Future of Systems
Heterogeneous Computing : The Future of SystemsHeterogeneous Computing : The Future of Systems
Heterogeneous Computing : The Future of SystemsAnand Haridass
 
CSW2017 Privilege escalation on high-end servers due to implementation gaps i...
CSW2017 Privilege escalation on high-end servers due to implementation gaps i...CSW2017 Privilege escalation on high-end servers due to implementation gaps i...
CSW2017 Privilege escalation on high-end servers due to implementation gaps i...CanSecWest
 
Platform Security Summit 18: Xen Security Weather Report 2018
Platform Security Summit 18: Xen Security Weather Report 2018Platform Security Summit 18: Xen Security Weather Report 2018
Platform Security Summit 18: Xen Security Weather Report 2018The Linux Foundation
 
Application hosting in the Intelligent WAN
Application hosting in the Intelligent WANApplication hosting in the Intelligent WAN
Application hosting in the Intelligent WANCisco DevNet
 
TechWiseTV Workshop: Cisco HyperFlex Systems
TechWiseTV Workshop: Cisco HyperFlex SystemsTechWiseTV Workshop: Cisco HyperFlex Systems
TechWiseTV Workshop: Cisco HyperFlex SystemsRobb Boyd
 
Stacki at the Seattle Scalability Meetup
Stacki at the Seattle Scalability MeetupStacki at the Seattle Scalability Meetup
Stacki at the Seattle Scalability MeetupStackIQ
 
Heterogeneous Computing on POWER - IBM and OpenPOWER technologies to accelera...
Heterogeneous Computing on POWER - IBM and OpenPOWER technologies to accelera...Heterogeneous Computing on POWER - IBM and OpenPOWER technologies to accelera...
Heterogeneous Computing on POWER - IBM and OpenPOWER technologies to accelera...Cesar Maciel
 
Webinar: Untethering Compute from Storage
Webinar: Untethering Compute from StorageWebinar: Untethering Compute from Storage
Webinar: Untethering Compute from StorageAvere Systems
 
Presentation architecting a cloud infrastructure
Presentation   architecting a cloud infrastructurePresentation   architecting a cloud infrastructure
Presentation architecting a cloud infrastructuresolarisyourep
 
Presentation architecting a cloud infrastructure
Presentation   architecting a cloud infrastructurePresentation   architecting a cloud infrastructure
Presentation architecting a cloud infrastructurexKinAnx
 
NFV Orchestration for Optimal Performance
NFV Orchestration for Optimal PerformanceNFV Orchestration for Optimal Performance
NFV Orchestration for Optimal Performancedfilppi
 

Similar to 04_Bolen-and-Ballard_PCIe-Hot-Plug-and-Error-Handling-for-NVMe_Final-3.13-apb (1).pptx (20)

PCI Express* based Storage: Data Center NVM Express* Platform Topologies
PCI Express* based Storage: Data Center NVM Express* Platform TopologiesPCI Express* based Storage: Data Center NVM Express* Platform Topologies
PCI Express* based Storage: Data Center NVM Express* Platform Topologies
 
The State of CXL-related Activities within OCP
The State of CXL-related Activities within OCPThe State of CXL-related Activities within OCP
The State of CXL-related Activities within OCP
 
Evatronix track h
Evatronix   track hEvatronix   track h
Evatronix track h
 
Accelerating Cassandra Workloads on Ceph with All-Flash PCIE SSDS
Accelerating Cassandra Workloads on Ceph with All-Flash PCIE SSDSAccelerating Cassandra Workloads on Ceph with All-Flash PCIE SSDS
Accelerating Cassandra Workloads on Ceph with All-Flash PCIE SSDS
 
NVMe_Infrastructure_final1.pdf
NVMe_Infrastructure_final1.pdfNVMe_Infrastructure_final1.pdf
NVMe_Infrastructure_final1.pdf
 
S104878 nvme-revolution-jburg-v1809b
S104878 nvme-revolution-jburg-v1809bS104878 nvme-revolution-jburg-v1809b
S104878 nvme-revolution-jburg-v1809b
 
Hyper-V Best Practices & Tips and Tricks
Hyper-V Best Practices & Tips and TricksHyper-V Best Practices & Tips and Tricks
Hyper-V Best Practices & Tips and Tricks
 
EMC Symmetrix VMAX: An Introduction to Enterprise Storage: Brian Boyd, Varrow...
EMC Symmetrix VMAX: An Introduction to Enterprise Storage: Brian Boyd, Varrow...EMC Symmetrix VMAX: An Introduction to Enterprise Storage: Brian Boyd, Varrow...
EMC Symmetrix VMAX: An Introduction to Enterprise Storage: Brian Boyd, Varrow...
 
Heterogeneous Computing : The Future of Systems
Heterogeneous Computing : The Future of SystemsHeterogeneous Computing : The Future of Systems
Heterogeneous Computing : The Future of Systems
 
CSW2017 Privilege escalation on high-end servers due to implementation gaps i...
CSW2017 Privilege escalation on high-end servers due to implementation gaps i...CSW2017 Privilege escalation on high-end servers due to implementation gaps i...
CSW2017 Privilege escalation on high-end servers due to implementation gaps i...
 
Platform Security Summit 18: Xen Security Weather Report 2018
Platform Security Summit 18: Xen Security Weather Report 2018Platform Security Summit 18: Xen Security Weather Report 2018
Platform Security Summit 18: Xen Security Weather Report 2018
 
Application hosting in the Intelligent WAN
Application hosting in the Intelligent WANApplication hosting in the Intelligent WAN
Application hosting in the Intelligent WAN
 
Troubleshooting Storage Devices Using vRealize Operations (formerly vC Ops)
Troubleshooting Storage Devices Using vRealize Operations (formerly vC Ops)Troubleshooting Storage Devices Using vRealize Operations (formerly vC Ops)
Troubleshooting Storage Devices Using vRealize Operations (formerly vC Ops)
 
TechWiseTV Workshop: Cisco HyperFlex Systems
TechWiseTV Workshop: Cisco HyperFlex SystemsTechWiseTV Workshop: Cisco HyperFlex Systems
TechWiseTV Workshop: Cisco HyperFlex Systems
 
Stacki at the Seattle Scalability Meetup
Stacki at the Seattle Scalability MeetupStacki at the Seattle Scalability Meetup
Stacki at the Seattle Scalability Meetup
 
Heterogeneous Computing on POWER - IBM and OpenPOWER technologies to accelera...
Heterogeneous Computing on POWER - IBM and OpenPOWER technologies to accelera...Heterogeneous Computing on POWER - IBM and OpenPOWER technologies to accelera...
Heterogeneous Computing on POWER - IBM and OpenPOWER technologies to accelera...
 
Webinar: Untethering Compute from Storage
Webinar: Untethering Compute from StorageWebinar: Untethering Compute from Storage
Webinar: Untethering Compute from Storage
 
Presentation architecting a cloud infrastructure
Presentation   architecting a cloud infrastructurePresentation   architecting a cloud infrastructure
Presentation architecting a cloud infrastructure
 
Presentation architecting a cloud infrastructure
Presentation   architecting a cloud infrastructurePresentation   architecting a cloud infrastructure
Presentation architecting a cloud infrastructure
 
NFV Orchestration for Optimal Performance
NFV Orchestration for Optimal PerformanceNFV Orchestration for Optimal Performance
NFV Orchestration for Optimal Performance
 

Recently uploaded

Abortion Pill for sale in Riyadh ((+918761049707) Get Cytotec in Dammam
Abortion Pill for sale in Riyadh ((+918761049707) Get Cytotec in DammamAbortion Pill for sale in Riyadh ((+918761049707) Get Cytotec in Dammam
Abortion Pill for sale in Riyadh ((+918761049707) Get Cytotec in Dammamahmedjiabur940
 
在线制作(ANU毕业证书)澳大利亚国立大学毕业证成绩单原版一比一
在线制作(ANU毕业证书)澳大利亚国立大学毕业证成绩单原版一比一在线制作(ANU毕业证书)澳大利亚国立大学毕业证成绩单原版一比一
在线制作(ANU毕业证书)澳大利亚国立大学毕业证成绩单原版一比一ougvy
 
Mahindra XUV new version for smooth travelling
Mahindra XUV new version for smooth travellingMahindra XUV new version for smooth travelling
Mahindra XUV new version for smooth travellingSailaja Gudipati
 
如何办理(UVic毕业证书)维多利亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UVic毕业证书)维多利亚大学毕业证成绩单本科硕士学位证留信学历认证如何办理(UVic毕业证书)维多利亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UVic毕业证书)维多利亚大学毕业证成绩单本科硕士学位证留信学历认证mestb
 
一比一原版(USYD毕业证书)澳洲悉尼大学毕业证如何办理
一比一原版(USYD毕业证书)澳洲悉尼大学毕业证如何办理一比一原版(USYD毕业证书)澳洲悉尼大学毕业证如何办理
一比一原版(USYD毕业证书)澳洲悉尼大学毕业证如何办理uodye
 
Best CPU for gaming Intel Core i9-14900K 14th Gen Desktop CPU
Best CPU for gaming  Intel Core i9-14900K 14th Gen Desktop CPUBest CPU for gaming  Intel Core i9-14900K 14th Gen Desktop CPU
Best CPU for gaming Intel Core i9-14900K 14th Gen Desktop CPUZiaurRehman887108
 
一比一原版(CSUEB毕业证书)东湾分校毕业证原件一模一样
一比一原版(CSUEB毕业证书)东湾分校毕业证原件一模一样一比一原版(CSUEB毕业证书)东湾分校毕业证原件一模一样
一比一原版(CSUEB毕业证书)东湾分校毕业证原件一模一样ayoqf
 
如何办理(AUT毕业证书)奥克兰理工大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(AUT毕业证书)奥克兰理工大学毕业证成绩单本科硕士学位证留信学历认证如何办理(AUT毕业证书)奥克兰理工大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(AUT毕业证书)奥克兰理工大学毕业证成绩单本科硕士学位证留信学历认证mestb
 
如何办理(USYD毕业证书)悉尼大学毕业证成绩单原件一模一样
如何办理(USYD毕业证书)悉尼大学毕业证成绩单原件一模一样如何办理(USYD毕业证书)悉尼大学毕业证成绩单原件一模一样
如何办理(USYD毕业证书)悉尼大学毕业证成绩单原件一模一样wsppdmt
 
如何办理(OP毕业证书)奥塔哥理工学院毕业证成绩单本科硕士学位证留信学历认证
如何办理(OP毕业证书)奥塔哥理工学院毕业证成绩单本科硕士学位证留信学历认证如何办理(OP毕业证书)奥塔哥理工学院毕业证成绩单本科硕士学位证留信学历认证
如何办理(OP毕业证书)奥塔哥理工学院毕业证成绩单本科硕士学位证留信学历认证mestb
 
一比一维多利亚大学毕业证(victoria毕业证)成绩单学位证如何办理
一比一维多利亚大学毕业证(victoria毕业证)成绩单学位证如何办理一比一维多利亚大学毕业证(victoria毕业证)成绩单学位证如何办理
一比一维多利亚大学毕业证(victoria毕业证)成绩单学位证如何办理uodye
 
办理(uw学位证书)美国华盛顿大学毕业证续费收据一模一样
办理(uw学位证书)美国华盛顿大学毕业证续费收据一模一样办理(uw学位证书)美国华盛顿大学毕业证续费收据一模一样
办理(uw学位证书)美国华盛顿大学毕业证续费收据一模一样vwymvu
 
NON INVASIVE GLUCOSE BLODD MONITORING SYSTEM (1) (2) (1).pptx
NON INVASIVE GLUCOSE BLODD MONITORING SYSTEM (1) (2) (1).pptxNON INVASIVE GLUCOSE BLODD MONITORING SYSTEM (1) (2) (1).pptx
NON INVASIVE GLUCOSE BLODD MONITORING SYSTEM (1) (2) (1).pptxSimmySharma12
 
Matrix Methods.pptxhhhhhhhhhhhhhhhhhhhhh
Matrix Methods.pptxhhhhhhhhhhhhhhhhhhhhhMatrix Methods.pptxhhhhhhhhhhhhhhhhhhhhh
Matrix Methods.pptxhhhhhhhhhhhhhhhhhhhhhjoshuaclack73
 
Vibration of Continuous Systems.pjjjjjjjjptx
Vibration of Continuous Systems.pjjjjjjjjptxVibration of Continuous Systems.pjjjjjjjjptx
Vibration of Continuous Systems.pjjjjjjjjptxjoshuaclack73
 
Test bank for consumer behaviour buying having and being eighth canadian edit...
Test bank for consumer behaviour buying having and being eighth canadian edit...Test bank for consumer behaviour buying having and being eighth canadian edit...
Test bank for consumer behaviour buying having and being eighth canadian edit...robinsonayot
 

Recently uploaded (20)

Abortion Pill for sale in Riyadh ((+918761049707) Get Cytotec in Dammam
Abortion Pill for sale in Riyadh ((+918761049707) Get Cytotec in DammamAbortion Pill for sale in Riyadh ((+918761049707) Get Cytotec in Dammam
Abortion Pill for sale in Riyadh ((+918761049707) Get Cytotec in Dammam
 
在线制作(ANU毕业证书)澳大利亚国立大学毕业证成绩单原版一比一
在线制作(ANU毕业证书)澳大利亚国立大学毕业证成绩单原版一比一在线制作(ANU毕业证书)澳大利亚国立大学毕业证成绩单原版一比一
在线制作(ANU毕业证书)澳大利亚国立大学毕业证成绩单原版一比一
 
Buy Abortion pills in Riyadh |+966572737505 | Get Cytotec
Buy Abortion pills in Riyadh |+966572737505 | Get CytotecBuy Abortion pills in Riyadh |+966572737505 | Get Cytotec
Buy Abortion pills in Riyadh |+966572737505 | Get Cytotec
 
Mahindra XUV new version for smooth travelling
Mahindra XUV new version for smooth travellingMahindra XUV new version for smooth travelling
Mahindra XUV new version for smooth travelling
 
如何办理(UVic毕业证书)维多利亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UVic毕业证书)维多利亚大学毕业证成绩单本科硕士学位证留信学历认证如何办理(UVic毕业证书)维多利亚大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(UVic毕业证书)维多利亚大学毕业证成绩单本科硕士学位证留信学历认证
 
Contact +971581248768 to buy 100% original and safe abortion pills in Dubai a...
Contact +971581248768 to buy 100% original and safe abortion pills in Dubai a...Contact +971581248768 to buy 100% original and safe abortion pills in Dubai a...
Contact +971581248768 to buy 100% original and safe abortion pills in Dubai a...
 
一比一原版(USYD毕业证书)澳洲悉尼大学毕业证如何办理
一比一原版(USYD毕业证书)澳洲悉尼大学毕业证如何办理一比一原版(USYD毕业证书)澳洲悉尼大学毕业证如何办理
一比一原版(USYD毕业证书)澳洲悉尼大学毕业证如何办理
 
Best CPU for gaming Intel Core i9-14900K 14th Gen Desktop CPU
Best CPU for gaming  Intel Core i9-14900K 14th Gen Desktop CPUBest CPU for gaming  Intel Core i9-14900K 14th Gen Desktop CPU
Best CPU for gaming Intel Core i9-14900K 14th Gen Desktop CPU
 
In Riyadh Saudi Arabia |+966572737505 | Buy Cytotec| Get Abortion pills
In Riyadh Saudi Arabia |+966572737505 | Buy Cytotec| Get Abortion pillsIn Riyadh Saudi Arabia |+966572737505 | Buy Cytotec| Get Abortion pills
In Riyadh Saudi Arabia |+966572737505 | Buy Cytotec| Get Abortion pills
 
Abortion pills in Jeddah Saudi Arabia! +966572737505 Where to buy cytotec
Abortion pills in Jeddah Saudi Arabia! +966572737505 Where to buy cytotecAbortion pills in Jeddah Saudi Arabia! +966572737505 Where to buy cytotec
Abortion pills in Jeddah Saudi Arabia! +966572737505 Where to buy cytotec
 
一比一原版(CSUEB毕业证书)东湾分校毕业证原件一模一样
一比一原版(CSUEB毕业证书)东湾分校毕业证原件一模一样一比一原版(CSUEB毕业证书)东湾分校毕业证原件一模一样
一比一原版(CSUEB毕业证书)东湾分校毕业证原件一模一样
 
如何办理(AUT毕业证书)奥克兰理工大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(AUT毕业证书)奥克兰理工大学毕业证成绩单本科硕士学位证留信学历认证如何办理(AUT毕业证书)奥克兰理工大学毕业证成绩单本科硕士学位证留信学历认证
如何办理(AUT毕业证书)奥克兰理工大学毕业证成绩单本科硕士学位证留信学历认证
 
如何办理(USYD毕业证书)悉尼大学毕业证成绩单原件一模一样
如何办理(USYD毕业证书)悉尼大学毕业证成绩单原件一模一样如何办理(USYD毕业证书)悉尼大学毕业证成绩单原件一模一样
如何办理(USYD毕业证书)悉尼大学毕业证成绩单原件一模一样
 
如何办理(OP毕业证书)奥塔哥理工学院毕业证成绩单本科硕士学位证留信学历认证
如何办理(OP毕业证书)奥塔哥理工学院毕业证成绩单本科硕士学位证留信学历认证如何办理(OP毕业证书)奥塔哥理工学院毕业证成绩单本科硕士学位证留信学历认证
如何办理(OP毕业证书)奥塔哥理工学院毕业证成绩单本科硕士学位证留信学历认证
 
一比一维多利亚大学毕业证(victoria毕业证)成绩单学位证如何办理
一比一维多利亚大学毕业证(victoria毕业证)成绩单学位证如何办理一比一维多利亚大学毕业证(victoria毕业证)成绩单学位证如何办理
一比一维多利亚大学毕业证(victoria毕业证)成绩单学位证如何办理
 
办理(uw学位证书)美国华盛顿大学毕业证续费收据一模一样
办理(uw学位证书)美国华盛顿大学毕业证续费收据一模一样办理(uw学位证书)美国华盛顿大学毕业证续费收据一模一样
办理(uw学位证书)美国华盛顿大学毕业证续费收据一模一样
 
NON INVASIVE GLUCOSE BLODD MONITORING SYSTEM (1) (2) (1).pptx
NON INVASIVE GLUCOSE BLODD MONITORING SYSTEM (1) (2) (1).pptxNON INVASIVE GLUCOSE BLODD MONITORING SYSTEM (1) (2) (1).pptx
NON INVASIVE GLUCOSE BLODD MONITORING SYSTEM (1) (2) (1).pptx
 
Matrix Methods.pptxhhhhhhhhhhhhhhhhhhhhh
Matrix Methods.pptxhhhhhhhhhhhhhhhhhhhhhMatrix Methods.pptxhhhhhhhhhhhhhhhhhhhhh
Matrix Methods.pptxhhhhhhhhhhhhhhhhhhhhh
 
Vibration of Continuous Systems.pjjjjjjjjptx
Vibration of Continuous Systems.pjjjjjjjjptxVibration of Continuous Systems.pjjjjjjjjptx
Vibration of Continuous Systems.pjjjjjjjjptx
 
Test bank for consumer behaviour buying having and being eighth canadian edit...
Test bank for consumer behaviour buying having and being eighth canadian edit...Test bank for consumer behaviour buying having and being eighth canadian edit...
Test bank for consumer behaviour buying having and being eighth canadian edit...
 

04_Bolen-and-Ballard_PCIe-Hot-Plug-and-Error-Handling-for-NVMe_Final-3.13-apb (1).pptx

  • 1. Architected for Performance PCIe Hot-Plug and Error Handling for NVMe 2019 NVMe™ Annual Members Meeting and Developer Day March 19, 2019 Prepared by: Austin Bolen, Server Storage Technologist, Dell EMC Curtis Ballard, Storage Technologist, HPE Joe Cowan, Senior Systems Architect, HPE
  • 2. Agenda • The Importance of Hot-Plug and Error Handling for NVMe™ • Challenges with NVMe Hot-Plug and Error Handling • Solutions to NVMe Hot-Plug and Error Handling Challenges • Questions
  • 3. The Importance of Hot-Plug and Error Handling for NVMe™
  • 4. The Importance of Hot-Plug (RASM) * https://software.intel.com/en-us/articles/rasm-a-primer-for-isv-applications-engineers Better RASM = Reduced TCO Customer Requirements: • Surprise/Async hot-plug - No prepare-to-remove • Parity with SAS/SATA or better • Handle all PCIe errors, not just errors due to surprise/async removal
  • 5. The Importance of Hot-Plug (Reliability) * https://software.intel.com/en-us/articles/rasm-a-primer-for-isv-applications-engineers Reliability: Device reliability is key, however: • Small failure rates exacerbated at scale • Hundreds or thousands of systems per datacenter • Many drives per system • NAND wears out Failures will occur HA solutions will require Hot-Plug
  • 6. The Importance of Hot-Plug (Manageability) * https://software.intel.com/en-us/articles/rasm-a-primer-for-isv-applications-engineers Manageability: • Monitoring and reporting of device failure or predicted failure • Inventorying for re-provisioning of storage
  • 7. The Importance of Hot-Plug (Serviceability) * https://software.intel.com/en-us/articles/rasm-a-primer-for-isv-applications-engineers Serviceability: • Async hot-plug is required for SAS/SATA equivalent serviceability for NVMe drives • Async/surprise removal eliminates the need for: • Orderly removal software • A technician with physical access to replace drives may not have access to these software interfaces • Costly orderly removal hardware (attention buttons, power controllers, etc.)
  • 8. The Importance of Hot-Plug (Availability) * https://software.intel.com/en-us/articles/rasm-a-primer-for-isv-applications-engineers Availability: • Hot-plug increases availability by avoiding costly downtime due to: • Replacing failed drives • Re-provisioning storage
  • 9. Challenges with NVMe™ Hot-Plug and Error Handling
  • 10. NVMe™ Hot-Plug/Error Handling – Why is it such a heavy lift? Because it’s an ecosystem issue! • NVMe Drive • Platform • Hardware • Firmware • BMC • PCIe Root Port/Switch • Operating System • NVMe Driver • PCIe Driver • ACPI Driver • Applications Each player historically looking at their own piece. But who is looking at the whole picture? It’s a rope! It’s a wall! It’s a spear! It’s a tree! It’s a fan! It’s a snake!
  • 11. Hot-Plug Storage – A High-Level Comparison Host Software (Operating System, Drivers, Applications, UEFI/BIOS) SAS Controller SATA Controller NVMe Controller SAS Drive SATA Drive NVMe Drive Hot-Plug Barrier Processor Hardware above the barrier is not hot pluggable Hardware below the barrier is hot pluggable SAS Bus SATA Bus PCIe Bus • SAS/SATA drivers bind to controllers above the hot plug barrier • Protocol conversion provides software isolation • Physical layer conversion provides hardware isolation • NVMe™ drivers bind to controllers below the hot plug barrier • No protocol translation == No software isolation • No physical layer conversion == No hardware isolation
  • 12. The PCIe Hot-Plug Eras (Where we’ve been, Where we are) • The Standard Hot-Plug Controller (SHPC) Era – Timeframe: PCI/PCI-X, Early PCIe – Complex (196 page specification) – Orderly insertion/removal only – Async insert/removal likely to crash system – Additional hardware (expensive) – Power Controllers – Power/Attention Indicators/Buttons – Mechanical Retention Latch (MRL) • The Hot-Plug Surprise (HPS) Era – Timeframe: Starting with new form factors like PCIe storage and Thunderbolt to present day – New form factors demand a simplified user experience that eliminates orderly removal overhead – For NVMe, mimic SAS/SATA hot-plug model – Surprise insertion/removal – Surprise removal not supported by most OSes – Software or hardware initiated orderly removal typically required
  • 13. Hot-Plug Issues Persist After SHPC and HPS • System crashes are still possible • Errors if orderly removal process not followed with SHPC • Synthesized all 1’s data during errors - not always handled correctly by software • No strict model for interaction of stack components - leads to race conditions causing crashes and deadlocks • Other issues • Timely detection of removal and insertion (detection while in low power state) • Mechanical insert/remove issues (slow insert, angled insert, etc.) • Issues often require changes outside the component under test (OS, switch, etc.) • SHPC and HPS aren’t robust enough for complex use cases
  • 14. Solutions to NVMe™ Hot-Plug and Error Handling Challenges
  • 15. Key Design Tenets • Create a hot-plug and error handling/recovery “toolbox” - Allow for flexibility in solution - Systems, Form Factors, OSes all have different needs - Support all PCIe use cases, not just NVMe - Tools to handle unforeseen issues • Fix known issues • Leverage and reach parity with existing solutions - SAS/SATA model  Eliminate need for orderly insertion/removal - Proprietary PCIe error recovery models • Multi-phase approach with incremental improvements • Error recovery mechanisms must be extensible to all PCIe errors - Surprise/async removal errors - Minimize the chance of issue due to accidental removal of wrong device - Errors unrelated to hot-plug Hot-Plug & Error Handling Hot-Plug & Error Handling
  • 16. Key Design Tenets • Hooks for time-to-market • System hardware/firmware changes should be sufficient for: • New system designs and form factors • Fixing defects/unforeseen issues • Avoid/minimize need for: • Future OS changes • Future PCIe Root Port/Switch changes
  • 17. Industry Alignment • Alignment/Feedback from OEMs • Dell EMC • HPE • Lenovo • Oracle • Alignment/Feedback from PCIe Root Port and Switch Vendors • AMD • Broadcom • Intel • Microsemi • OSVs • Microsoft • VMWare • Linux distributors/kernel developers
  • 18. ECN Sponsors Standards Bodies Specifications Standards-Based Solution Proposal Standard Stage Description System Firmware Intermediary (SFI) PCIe Base Spec Ratified. ECN Published to PCI-SIG Website. Adds system firmware layer between OS and PCIe devices for hot-plug. Containment Error Recovery (CER) PCIe Base Spec Ratified. ECN Published to PCI-SIG Website. Defines software/firmware PCIe error recovery model built on top of Downstream Port Containment hardware. ACPI Spec Released In ACPI 6.3 PCI Firmware Specification Ratified. ECN Published to PCI-SIG Website. Hot-Plug Extensions (_HPX) ACPI Spec Released In ACPI 6.3 Allows system firmware to tell OS how to set PCIe Configuration Space for hot-inserted PCIe devices. PCI Firmware Specification Member Review Complete. Should be ratified shortly.
  • 19. CER Era Host SW/FW (Operating System, Drivers, Applications, UEFI/BIOS) PCIe Root Port w/ DPC NVMe Drive Processor PCIe Bus Error PCIe Root Port w/ DPC Switch Upstream Port Switch Downstream Port w/ DPC Switch Downstream Port w/ DPC NVMe Drive NVMe Drive Async Removal or other errors detected by the Root Port or Switch DPC in Root Port or Switch contains errors by forcing/keeping PCIe link down 1 2 3 4 5 The Root Port or Switch notifies FW or host OS FW and/or host OS entities attempt to recover from the error PCIe Bus PCIe Bus Async Remove Host OS releases DPC and restarts device if present and recovered PCIe Switch • The Containment Error Recovery (CER) Era – Timeframe: Transitioning now – Replaces HPS – The term “async” replaces “surprise” (i.e. async removal/insertion instead of surprise insertion/removal) in PCIe specs – CER software/firmware model can be used to recover from many PCIe errors – not just errors due to async removal – Utilizes Downstream Port Containment (DPC) hardware in PCIe root ports and switch downstream ports to contain errors including async remove related errors – Two CER modes: Native OS Controlled and Firmware First › Firmware First mode requires ACPI changes in OS and BIOS/UEFI – Based on tried-and-true proprietary models
  • 20. System Firmware Intermediary Era Host Software (Operating System, Drivers, Applications, UEFI/BIOS) SAS Controller SATA Controller NVMe Controller SAS Drive SATA Drive NVMe Drive Hot-Plug Barrier Processor Hardware above the barrier is not hot pluggable Hardware below the barrier is hot pluggable SAS Bus SATA Bus PCIe Bus System Firmware Intermediary (SFI) • SFI isolates PCIe hot-plug events from the OS, drivers, and applications for hot-plug - does not alter data path. • Hardware isolation in PCIe Root Ports and Switch Downstream Ports • Provides options to invoke system firmware (BIOS, UEFI, BMC, etc.) for hot-plug events • Particularly useful for complex out-of-band (independent of host OS) platform config of hot-inserted devices (e.g., unlocking TCG drives or device authentication) • The System Firmware Intermediary (SFI) Era – Timeframe: Silicon support will arrive over next several years – Does not replace DPC/CER - works alongside DPC/CER – Adds hardware/firmware layer between OS and devices for hot-plug
  • 21. Hot-Plug Parameter Extensions (_HPX) • _HPX exists across all hot-plug eras • _HPX allows system firmware to provide system-specific PCIe config space settings to OS – Not just for hot-inserted device; also used if device is reset at runtime • New _HPX Setting Record (Type 3) defined in ACPI specification – Previous setting records only worked for pre-defined registers – New registers required spec update an OS change – New Type 3 record can specify any register with offset relative to offset 0h of: – The start of configuration space – A Capability Structure – An Extended Capability Structure – A Vendor-Specific Extended Capability – A Designated Vendor-Specific Extended Capability • Handle different revisions of capability structures – Apply changes to any revision of the capability structure – Apply changes to a specific revision of the capability structure – Apply changes to capability structures with revision greater than or equal to the specified revision • Supports simple if-then-else conditional grammar – E.g., to set PCIe configuration space registers to preferred value based on device capability • Lightweight alternative to SFI for simple config space settings Example Pseudocode – Set Completion Timeout (CTO) Value based on device’s Completion Timeout Ranges Supported: If CTO Range B supported then Set CTO Value to 65 ms to 210 ms Else if CTO Range C supported then Set CTO Value to 260 ms to 900 ms Else if CTO Range D supported then Set CTO Value to 4 s to 13 s Else Set CTO Disable
  • 22. Next Steps • PCIe Root Ports and Switches - Add support for DPC/eDPC - Add support for SFI • Operating Systems and OEMs - Add support for async removal in HPS mode as a stop-gap until CER can be fully implemented - Add support for Containment Error Recovery Model defined by PCI-SIG  Native OS controlled and Firmware First models - Review/contribute to open source effort  DPC Containment Error Recovery patches submitted to Linux kernel o Also called Error Disconnect Recover (EDR) after the ACPI method used in DPC CER model  _HPX patches submitted to Linux kernel • Connectors/Form Factors - Design for async hot-plug - Prevent damage to I/O pins on hot-insert typically by making ground pins longer than other pins - Limit current surge on hot-insert  Pre-charge pin for each voltage rail which is second to mate or  Soft start/hot-plug circuits for each rail - Physical presence mandatory  Should be shortest pin so platform knows when device is fully inserted  May need a presence pin on each end of connector unless you can guarantee connector cannot mate at an angle - Make sure pins can’t cross-connect on insert - Consider issues with pin wipe b/c higher frequencies demand shorter pin lengths making it difficult to support pins of different length - Form factors should allow for stable insert/removal - Form factors should allow adequate mount points
  • 23. Resources Resource Link ACPI 6.3: Add “Error Disconnect Recover” mechanism for DPC and new Hot-Plug Parameter Extensions (_HPX) Setting Record (Type 3) https://uefi.org/sites/default/files/resources/ACPI_6_3_final_Jan30.pdf (DPC EDR) https://mantis.uefi.org/mantis/view.php?id=1939* (_HPX) https://mantis.uefi.org/mantis/view.php?id=1922* PCI Express Base Specification Revision 4.0 Version 1.0 https://members.pcisig.com/wg/PCI-SIG/document/10912?downloadRevision=active* PCIe Base Spec. ECN: Async Hot-Plug Updates (DPC/CER, SFI) https://members.pcisig.com/wg/PCI-SIG/document/12400* PCI Firmware Spec. ECN: Downstream Port Containment related Enhancements https://members.pcisig.com/wg/PCI-SIG/document/12614* PCI Firmware Spec. ECN: _HPX and PCIe Completion Timeout related _OSC Enhancements https://members.pcisig.com/wg/PCI-SIG/document/12712* Dell EMC Tech Note: NVMe Hot-Plug Challenges and Industry Adoption https://downloads.dell.com/manuals/common/dfd_-_nvme_hot- plug_challenges_and_industry_adoption.pdf Implementing Hot-Plug in NVMe Storage Systems https://www.flashmemorysummit.com/English/Collaterals/Proceedings/2018/20180808_NVME- 201-2_Yung.pdf The Modernization of PCIe Hot-Plug in Linux https://lwn.net/Articles/767885/ * Requires member access to the relevant standards body website
  • 24. Linux Enablement Feature Patch Link DPC Containment Error Recovery (CER) Add Error Disconnect Recover (EDR) support https://patchwork.kernel.org/cover/10833723/ Add _OSC based negotiation support for DPC https://patchwork.kernel.org/patch/10833717/ Add Error Disconnect Recover (EDR) ACPI notifier support https://patchwork.kernel.org/patch/10833725/ Add Error Disconnect Recover (EDR) support https://patchwork.kernel.org/patch/10833721/ Hot-Plug Parameter Extensions (HPX) Implement support for _HPX Type 3 tables https://patchwork.kernel.org/cover/10843875/ Do not export pci_get_hp_params() https://patchwork.kernel.org/patch/10843877/ Remove the need for 'struct hotplug_params’ https://patchwork.kernel.org/patch/10843887/ Implement Type 3 _HPX record https://patchwork.kernel.org/patch/10843883/ Advertise HPX type 3 support via _OSC https://patchwork.kernel.org/patch/10855469/

Editor's Notes

  1. It takes a long time and is hard working with all the different parties for fixes for issues found when using SHPC and HPS causing delayed time to market and extra expense.