SlideShare a Scribd company logo
1 of 25
Download to read offline
1© 2019 Joyent, Inc.
Joyent Technical Discussion
NVMe Hotplug
Jordan Hendricks
August 20, 2019
© 2019 Joyent, Inc. 2
Agenda
1. Motivation
2. Goals
3. Strategy
4. Implementation
5. Demos!
Left: “Campfire Pinecone.png” by Emeldi is licensed under Creative Commons Attribution-Share Alike 3.0 Unported
Right: “A Plug.jpg” by Maddin the brain is licensed under Creative Commons Attribution-Share Alike 3.0 Unported
© 2019 Joyent, Inc. 3
Motivation: Why NVMe?
● “NVM” = Non-Volatile Memory
● Standard for non-volatile storage over PCIe
● Designed for NVM instead of hard disks
○ e.g., more queues and deeper queues for I/O
● Several form factors
○ We are interested in U.2 (2.5 inch HDD form factor)
● illumos already has driver support
© 2019 Joyent, Inc. 4
Motivation: Why hotplug?
● “Hotplug” = generally, insertion or removal of devices on a
live system
○ Can be “coordinated” or “surprise”
■ Coordinated = notify OS first
■ Surprise = just pull/insert device
● Operational gains: no need to power off a system first
○ Guards against bad panics or errors if wrong device is
pulled
© 2019 Joyent, Inc. 5
NVMe SSD in slot
© 2019 Joyent, Inc. 6
Project Goals
1. Support coordinated insertion/removal of NVMe SSDs
2. Support surprise removal of NVMe SSDs
…for drives not in a zpool
...for drives in a zpool
...without ongoing I/O
...with ongoing I/O
© 2019 Joyent, Inc. 7
Project Strategy
● Hotplug support exists in the form of the hotplug
framework, integrated 10 years ago
● To find issues, tried various forms of hotplug with
NVMe drive
© 2019 Joyent, Inc. 8
Project Strategy
● Coordinated removal and insertion mostly worked from the
start*:
○ Removal
cfgadm -c unconfigure <slot> // offline device
cfgadm -c disconnect <slot> // power off slot
○ Insertion
cfgadm -c connect <slot> // power on slot
cfgadm -c configure <slot> // online device
*Caveat: except for OS-7494 , pci configurator issue
© 2019 Joyent, Inc. 9
Project Strategy
● These surprise removal experiments didn’t:
○ removing a device and plugging it back in: failed to detach
on pull; confusingly detached when plugged back in
○ pull device with ongoing I/O from dd(1M): dd hung
waiting for I/O in the kernel
○ pull a device from a mirrored zpool with no I/O: zpool
status hung waiting on I/O in the kernel
© 2019 Joyent, Inc. 10
Implementation: Hotplug Framework
EMPTY
POWERED
ENABLED
PRESENT
No device in slot
Device present in slot
Slot powered on
Device ready for use
Note that the slot state could jump straight from enabled to empty, but this is the general order of operations.
© 2019 Joyent, Inc. 11
Implementation: Hotplug Framework
● Removal case: no detach after pull
○ PCIe bridge receives a PCIe hotplug interrupt indicating the
slot status has changed
○ Hotplug code would request a state change for slot to
EMPTY from interrupt path
○ State change handler fetches state again from hardware,
see that the connection is already EMPTY, and do nothing
© 2019 Joyent, Inc. 12
Implementation: Hotplug Framework
● Insertion case: detach after inserted
○ PCIe bridge receives a PCIe hotplug interrupt indicating the slot
status has changed
○ Hotplug code would request a state change for slot to
PRESENT from interrupt path
○ State change handler fetches state again from hardware, see
that the connection state is PRESENT, and assumes it is
already ENABLED
○ Then, we the state change request happens, it thinks it’s going
backward (ENABLED -> PRESENT), and detaches drivers
© 2019 Joyent, Inc. 13
Implementation: Hotplug Framework
● Solution: Change state change transition checks in hotplug
framework to be aware of hot removal
● Internal state of hotplug framework can be out of sync if state
changes happen through surprise removal
● When in doubt, clean up structures
© 2019 Joyent, Inc. 14
Implementation: I/O Hang
● Both dd and zpool status hung on I/O in the kernel after hot
removal
● Want a way to notify the NVMe driver that its device is gone
● Solution?
○ Implement removal event callbacks using Nexus Driver
Interface Events
○ Add callback to the NVMe driver to fire on removal events
○ Plumb up support in nexus driver to fire removal events
© 2019 Joyent, Inc. 15
Implementation: I/O Hang
rootnex
pcieb pciehpc
nvme
blkdev
npe
© 2019 Joyent, Inc. 16
Implementation: I/O Hang
rootnex
pcieb pciehpc
nvme
blkdev
npe
npe: Implement bus ops for NDI
events.
(Events are passed up the tree
until a nexus can handle them.)
nvme: Register for remove events
on attach(9E). When remove
event fires, fail all outstanding
commands.
© 2019 Joyent, Inc. 17
Demo: Coordinated Removal
1: List all connections on the system.
# cfgadm
2: Meanwhile, trace whether the driver is detached…
# dtrace -n ‘fbt::nvme_detach:entry’
3: And see how many nvme driver instances we have to start.
# mdb -k
> ::prtconf -d nvme // 4 instances
© 2019 Joyent, Inc. 18
Demo: Coordinated Removal
4: Power off connection “Slot12”.
# cfgadm -c disconnect Slot12
5: Confirm with cfgadm.
# cfgadm
6: Confirm we see a detach dtrace output.
7. Confirm number of instances in mdb.
# mdb -k
> ::prtconf -d nvme // only 3 instances
© 2019 Joyent, Inc. 19
Demo: Uncoordinated Removal
1: List all connections on the system.
# cfgadm
2: Meanwhile, trace whether the driver is detached…
# dtrace -n ‘fbt::nvme_detach:entry’
3: And see how many nvme driver instances we have to start.
# mdb -k
> ::prtconf -d nvme // 4 instances
© 2019 Joyent, Inc. 20
Demo: Uncoordinated Removal
4: Hot removal of drive.
5: Observe dtrace output.
6: Observe nvme instances in mdb.
7: Observe in cfgadm.
© 2019 Joyent, Inc. 21
Demo: Uncoordinated Removal with I/O
1: Start dd command on disk.
# dd if=/dev/urandom of=/dev/rdsk/c3t1d0p0 bs=1M count=10240
2: Pull disk.
3. Observe EIO output.
© 2019 Joyent, Inc. 22
Demo: Uncoordinated Removal from zpool
1: Create mirrored zpool.
# zpool create test mirror c3t1d0 c4t1d0
2: Check pool status.
# zpool status test
3: Create some files.
# cd /test; echo foo > foo; cat foo;
4: Pull disk!
© 2019 Joyent, Inc. 23
Demo: Uncoordinated Removal from zpool
5: Check pool status.
# zpool status test
6: Read and write from the pool.
# cd /test; echo bar > bar; cat foo; cat bar;
7: Online device.
# cfgadm -c configure Slot12
8: Check pool status.
# zpool status test
© 2019 Joyent, Inc. 24
Remaining Work
● Hot removal from zpool with ongoing I/O panics due to OS-2743
● nvme driver sometimes fails to detach after dd exits because
blkdev still has references in devices tree (OS-7956)
● diskinfo sometimes does not list NVMe drives (OS-7940)
● Auto-online device on surprise insertion
● More work to improve NVMe operationally!
© 2019 Joyent, Inc. 25
Questions?

More Related Content

Similar to NVMe Hotplug Walkthrough

Manual nv 105
Manual nv 105Manual nv 105
Manual nv 105grana2810
 
Android Platform Debugging and Development
Android Platform Debugging and DevelopmentAndroid Platform Debugging and Development
Android Platform Debugging and DevelopmentOpersys inc.
 
ELC2019: Static Partitioning Made Simple
ELC2019: Static Partitioning Made SimpleELC2019: Static Partitioning Made Simple
ELC2019: Static Partitioning Made SimpleThe Linux Foundation
 
Android Platform Debugging and Development
Android Platform Debugging and DevelopmentAndroid Platform Debugging and Development
Android Platform Debugging and DevelopmentOpersys inc.
 
Android Platform Debugging and Development
Android Platform Debugging and DevelopmentAndroid Platform Debugging and Development
Android Platform Debugging and DevelopmentKarim Yaghmour
 
Android Platform Debugging and Development
Android Platform Debugging and DevelopmentAndroid Platform Debugging and Development
Android Platform Debugging and DevelopmentOpersys inc.
 
Android Platform Debugging and Development
Android Platform Debugging and DevelopmentAndroid Platform Debugging and Development
Android Platform Debugging and DevelopmentOpersys inc.
 
Free5 gc installation
Free5 gc installationFree5 gc installation
Free5 gc installationChia-An Lee
 
How to Monitor Your Gaming Computer with a Time Series Database
 How to Monitor Your Gaming Computer with a Time Series Database How to Monitor Your Gaming Computer with a Time Series Database
How to Monitor Your Gaming Computer with a Time Series DatabaseInfluxData
 
Android Platform Debugging and Development
Android Platform Debugging and DevelopmentAndroid Platform Debugging and Development
Android Platform Debugging and DevelopmentOpersys inc.
 
Android Platform Debugging and Development
Android Platform Debugging and DevelopmentAndroid Platform Debugging and Development
Android Platform Debugging and DevelopmentOpersys inc.
 
Reducing boot time in embedded Linux
Reducing boot time in embedded LinuxReducing boot time in embedded Linux
Reducing boot time in embedded LinuxChris Simmonds
 
Dell Venue 7 3740
Dell Venue 7 3740Dell Venue 7 3740
Dell Venue 7 3740Kojo King
 
Android Platform Debugging and Development
Android Platform Debugging and DevelopmentAndroid Platform Debugging and Development
Android Platform Debugging and DevelopmentOpersys inc.
 
Android Platform Debugging and Development
Android Platform Debugging and DevelopmentAndroid Platform Debugging and Development
Android Platform Debugging and DevelopmentOpersys inc.
 
BitVisor Summit 7「3. Interesting Issues During NVMe Driver Development」
BitVisor Summit 7「3. Interesting Issues During NVMe Driver Development」BitVisor Summit 7「3. Interesting Issues During NVMe Driver Development」
BitVisor Summit 7「3. Interesting Issues During NVMe Driver Development」BitVisor
 
HKG15-409: ARM Hibernation enablement on SoCs - a case study
HKG15-409: ARM Hibernation enablement on SoCs - a case studyHKG15-409: ARM Hibernation enablement on SoCs - a case study
HKG15-409: ARM Hibernation enablement on SoCs - a case studyLinaro
 

Similar to NVMe Hotplug Walkthrough (20)

Manual nv 105
Manual nv 105Manual nv 105
Manual nv 105
 
Android Platform Debugging and Development
Android Platform Debugging and DevelopmentAndroid Platform Debugging and Development
Android Platform Debugging and Development
 
ELC2019: Static Partitioning Made Simple
ELC2019: Static Partitioning Made SimpleELC2019: Static Partitioning Made Simple
ELC2019: Static Partitioning Made Simple
 
Android Platform Debugging and Development
Android Platform Debugging and DevelopmentAndroid Platform Debugging and Development
Android Platform Debugging and Development
 
Android Platform Debugging and Development
Android Platform Debugging and DevelopmentAndroid Platform Debugging and Development
Android Platform Debugging and Development
 
Android Platform Debugging and Development
Android Platform Debugging and DevelopmentAndroid Platform Debugging and Development
Android Platform Debugging and Development
 
Android Platform Debugging and Development
Android Platform Debugging and DevelopmentAndroid Platform Debugging and Development
Android Platform Debugging and Development
 
Free5 gc installation
Free5 gc installationFree5 gc installation
Free5 gc installation
 
X Means Y
X Means YX Means Y
X Means Y
 
How to Monitor Your Gaming Computer with a Time Series Database
 How to Monitor Your Gaming Computer with a Time Series Database How to Monitor Your Gaming Computer with a Time Series Database
How to Monitor Your Gaming Computer with a Time Series Database
 
Android Platform Debugging and Development
Android Platform Debugging and DevelopmentAndroid Platform Debugging and Development
Android Platform Debugging and Development
 
Android Platform Debugging and Development
Android Platform Debugging and DevelopmentAndroid Platform Debugging and Development
Android Platform Debugging and Development
 
Reducing boot time in embedded Linux
Reducing boot time in embedded LinuxReducing boot time in embedded Linux
Reducing boot time in embedded Linux
 
Dell Venue 7 3740
Dell Venue 7 3740Dell Venue 7 3740
Dell Venue 7 3740
 
Android Platform Debugging and Development
Android Platform Debugging and DevelopmentAndroid Platform Debugging and Development
Android Platform Debugging and Development
 
Android Platform Debugging & Development
Android Platform Debugging & Development Android Platform Debugging & Development
Android Platform Debugging & Development
 
Android Platform Debugging and Development
Android Platform Debugging and DevelopmentAndroid Platform Debugging and Development
Android Platform Debugging and Development
 
BitVisor Summit 7「3. Interesting Issues During NVMe Driver Development」
BitVisor Summit 7「3. Interesting Issues During NVMe Driver Development」BitVisor Summit 7「3. Interesting Issues During NVMe Driver Development」
BitVisor Summit 7「3. Interesting Issues During NVMe Driver Development」
 
HKG15-409: ARM Hibernation enablement on SoCs - a case study
HKG15-409: ARM Hibernation enablement on SoCs - a case studyHKG15-409: ARM Hibernation enablement on SoCs - a case study
HKG15-409: ARM Hibernation enablement on SoCs - a case study
 
BeagleBoard-xM Booting Process
BeagleBoard-xM Booting ProcessBeagleBoard-xM Booting Process
BeagleBoard-xM Booting Process
 

Recently uploaded

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdfChristopherTHyatt
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 

Recently uploaded (20)

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 

NVMe Hotplug Walkthrough

  • 1. 1© 2019 Joyent, Inc. Joyent Technical Discussion NVMe Hotplug Jordan Hendricks August 20, 2019
  • 2. © 2019 Joyent, Inc. 2 Agenda 1. Motivation 2. Goals 3. Strategy 4. Implementation 5. Demos! Left: “Campfire Pinecone.png” by Emeldi is licensed under Creative Commons Attribution-Share Alike 3.0 Unported Right: “A Plug.jpg” by Maddin the brain is licensed under Creative Commons Attribution-Share Alike 3.0 Unported
  • 3. © 2019 Joyent, Inc. 3 Motivation: Why NVMe? ● “NVM” = Non-Volatile Memory ● Standard for non-volatile storage over PCIe ● Designed for NVM instead of hard disks ○ e.g., more queues and deeper queues for I/O ● Several form factors ○ We are interested in U.2 (2.5 inch HDD form factor) ● illumos already has driver support
  • 4. © 2019 Joyent, Inc. 4 Motivation: Why hotplug? ● “Hotplug” = generally, insertion or removal of devices on a live system ○ Can be “coordinated” or “surprise” ■ Coordinated = notify OS first ■ Surprise = just pull/insert device ● Operational gains: no need to power off a system first ○ Guards against bad panics or errors if wrong device is pulled
  • 5. © 2019 Joyent, Inc. 5 NVMe SSD in slot
  • 6. © 2019 Joyent, Inc. 6 Project Goals 1. Support coordinated insertion/removal of NVMe SSDs 2. Support surprise removal of NVMe SSDs …for drives not in a zpool ...for drives in a zpool ...without ongoing I/O ...with ongoing I/O
  • 7. © 2019 Joyent, Inc. 7 Project Strategy ● Hotplug support exists in the form of the hotplug framework, integrated 10 years ago ● To find issues, tried various forms of hotplug with NVMe drive
  • 8. © 2019 Joyent, Inc. 8 Project Strategy ● Coordinated removal and insertion mostly worked from the start*: ○ Removal cfgadm -c unconfigure <slot> // offline device cfgadm -c disconnect <slot> // power off slot ○ Insertion cfgadm -c connect <slot> // power on slot cfgadm -c configure <slot> // online device *Caveat: except for OS-7494 , pci configurator issue
  • 9. © 2019 Joyent, Inc. 9 Project Strategy ● These surprise removal experiments didn’t: ○ removing a device and plugging it back in: failed to detach on pull; confusingly detached when plugged back in ○ pull device with ongoing I/O from dd(1M): dd hung waiting for I/O in the kernel ○ pull a device from a mirrored zpool with no I/O: zpool status hung waiting on I/O in the kernel
  • 10. © 2019 Joyent, Inc. 10 Implementation: Hotplug Framework EMPTY POWERED ENABLED PRESENT No device in slot Device present in slot Slot powered on Device ready for use Note that the slot state could jump straight from enabled to empty, but this is the general order of operations.
  • 11. © 2019 Joyent, Inc. 11 Implementation: Hotplug Framework ● Removal case: no detach after pull ○ PCIe bridge receives a PCIe hotplug interrupt indicating the slot status has changed ○ Hotplug code would request a state change for slot to EMPTY from interrupt path ○ State change handler fetches state again from hardware, see that the connection is already EMPTY, and do nothing
  • 12. © 2019 Joyent, Inc. 12 Implementation: Hotplug Framework ● Insertion case: detach after inserted ○ PCIe bridge receives a PCIe hotplug interrupt indicating the slot status has changed ○ Hotplug code would request a state change for slot to PRESENT from interrupt path ○ State change handler fetches state again from hardware, see that the connection state is PRESENT, and assumes it is already ENABLED ○ Then, we the state change request happens, it thinks it’s going backward (ENABLED -> PRESENT), and detaches drivers
  • 13. © 2019 Joyent, Inc. 13 Implementation: Hotplug Framework ● Solution: Change state change transition checks in hotplug framework to be aware of hot removal ● Internal state of hotplug framework can be out of sync if state changes happen through surprise removal ● When in doubt, clean up structures
  • 14. © 2019 Joyent, Inc. 14 Implementation: I/O Hang ● Both dd and zpool status hung on I/O in the kernel after hot removal ● Want a way to notify the NVMe driver that its device is gone ● Solution? ○ Implement removal event callbacks using Nexus Driver Interface Events ○ Add callback to the NVMe driver to fire on removal events ○ Plumb up support in nexus driver to fire removal events
  • 15. © 2019 Joyent, Inc. 15 Implementation: I/O Hang rootnex pcieb pciehpc nvme blkdev npe
  • 16. © 2019 Joyent, Inc. 16 Implementation: I/O Hang rootnex pcieb pciehpc nvme blkdev npe npe: Implement bus ops for NDI events. (Events are passed up the tree until a nexus can handle them.) nvme: Register for remove events on attach(9E). When remove event fires, fail all outstanding commands.
  • 17. © 2019 Joyent, Inc. 17 Demo: Coordinated Removal 1: List all connections on the system. # cfgadm 2: Meanwhile, trace whether the driver is detached… # dtrace -n ‘fbt::nvme_detach:entry’ 3: And see how many nvme driver instances we have to start. # mdb -k > ::prtconf -d nvme // 4 instances
  • 18. © 2019 Joyent, Inc. 18 Demo: Coordinated Removal 4: Power off connection “Slot12”. # cfgadm -c disconnect Slot12 5: Confirm with cfgadm. # cfgadm 6: Confirm we see a detach dtrace output. 7. Confirm number of instances in mdb. # mdb -k > ::prtconf -d nvme // only 3 instances
  • 19. © 2019 Joyent, Inc. 19 Demo: Uncoordinated Removal 1: List all connections on the system. # cfgadm 2: Meanwhile, trace whether the driver is detached… # dtrace -n ‘fbt::nvme_detach:entry’ 3: And see how many nvme driver instances we have to start. # mdb -k > ::prtconf -d nvme // 4 instances
  • 20. © 2019 Joyent, Inc. 20 Demo: Uncoordinated Removal 4: Hot removal of drive. 5: Observe dtrace output. 6: Observe nvme instances in mdb. 7: Observe in cfgadm.
  • 21. © 2019 Joyent, Inc. 21 Demo: Uncoordinated Removal with I/O 1: Start dd command on disk. # dd if=/dev/urandom of=/dev/rdsk/c3t1d0p0 bs=1M count=10240 2: Pull disk. 3. Observe EIO output.
  • 22. © 2019 Joyent, Inc. 22 Demo: Uncoordinated Removal from zpool 1: Create mirrored zpool. # zpool create test mirror c3t1d0 c4t1d0 2: Check pool status. # zpool status test 3: Create some files. # cd /test; echo foo > foo; cat foo; 4: Pull disk!
  • 23. © 2019 Joyent, Inc. 23 Demo: Uncoordinated Removal from zpool 5: Check pool status. # zpool status test 6: Read and write from the pool. # cd /test; echo bar > bar; cat foo; cat bar; 7: Online device. # cfgadm -c configure Slot12 8: Check pool status. # zpool status test
  • 24. © 2019 Joyent, Inc. 24 Remaining Work ● Hot removal from zpool with ongoing I/O panics due to OS-2743 ● nvme driver sometimes fails to detach after dd exits because blkdev still has references in devices tree (OS-7956) ● diskinfo sometimes does not list NVMe drives (OS-7940) ● Auto-online device on surprise insertion ● More work to improve NVMe operationally!
  • 25. © 2019 Joyent, Inc. 25 Questions?