SlideShare a Scribd company logo
I have come to bury the
BIOS, not to open it
The need for holistic systems
Bryan Cantrill
Oxide Computer Company
OXIDE
In the beginning…
• In the beginning, computing systems were holistic: hardware and
software were designed together, to work with one another
• System software was delivered with the computer – but often had to be
newly developed for a new machine
• Requiring new software for new hardware created schedule delays, viz.
OS/360 and The Mythical Man Month
• With the advent of Unix, system software became portable – it could be
ported rather than developed de novo for each new computer…
OXIDE
Unix spreads – and feuds
• The portability of Unix accelerated the minicomputer and workstation
revolutions, with each manufacturer having its own variants
• The systems of this era remained broadly holistic: the hardware and
software were (broadly) designed with the other in mind
• …but despite the original ethos of Unix, the variants themselves
remained entirely proprietary – and the differences between them ignited
the Unix Wars of the 1980s and 1990s
OXIDE
Elsewhere, homebrew computing
• With the rise of the microcomputer, computing became much more
broadly available in the 1970s – but nearly absurd variety with respect to
hardware made software standardization challenging
• The hardware-specific half of CP/M – the dominant microcomputer OS
of the 1970 – was the Basic Input Output System, and could be
delivered separately
• This gave rise to hardware vendors delivering ROMs that contained
platform enablement code roughly standardized as a “System BIOS”
OXIDE
The IBM PC era
• With the emergence of the IBM PC – and its de facto standardization by
Compaq – the system software/BIOS split became irreconcilable
• Essential hardware enabling-software was driven into the BIOS
• The BIOS interface became what system software bound to – it became
the definition of “compatibility”
• Worse, the software components on both sides of the BIOS/OS divide
were nearly exclusively proprietary, serving to harden the boundary
OXIDE
It gets worse: SMM
• In order to be able implement system software functionality delivered by
the hardware – e.g., laptop suspend and resume – system management
mode was invented
• SMM allows effectively arbitrary, hidden code execution at arbitrary time
without even allowing system software awareness
• This is the opposite of a holistic system: it is one that has been
deliberately and perniciously divided!
OXIDE
EFI/UEFI
• All of this might have been fine had x86 remained relegated to personal
computing…
• …but Intel and AMD out-executed the RISC vendors in the 2000s,
forcing PC constructs into the server space
• Starting with (ill-fated) Itanium, Intel introduced EFI in an attempt to
modernize…
OXIDE
UEFI: What might have been
Source: Beyond BIOS: Developing with the Unified Extensible Firmware Interface
OXIDE
UEFI: What happened instead
• While its goals were laudable, UEFI was overconstrained
• In particular, the need for legacy and Windows compatibility required
UEFI to support all past abstractions
• UEFI has become the worst of all worlds: complicated, proprietary
software that remains at once isolated from – yet also still entirely
entangled with! – system software
• UEFI has become so entangled with lowest-level platform enablement
that non-UEFI platforms are effectively impossible
OXIDE
It gets worse, again: Hidden cores
• A dividend of Moore’s Law: formerly discrete components were
increasingly pulled first into large ASICs – and then pulled on-die into a
system-on-a-chip
• Especially as I/O was brought directly into the die, CPUs developed an
increasing numbers of non-architectural cores to manage it
• But these cores are hidden to system software – the operating system
is being confined to an increasingly narrow slice of the true hardware
capabilities of the system…
OXIDE
…which is not lost on everyone!
Timothy Roscoe, OSDI 2021 Keynote, It's Time for Operating Systems to Rediscover Hardware
OXIDE
The battle for non-architectural cores
• Roscoe (rightfully) calls this a “security catastrophe”
• The non-architectural cores are – on x86 CPUs anyway – entirely
proprietary, with all of its concomitant problems; that the system is
“open source” is increasingly a myth
• Roscoe correctly identifies the problem, but understates the severity:
this isn’t a retreat of Linux – it is a resurgence of proprietary operating
systems, wrapping themselves in firmware
OXIDE
Is an open source BIOS the answer?
• An open source BIOS is certainly valuable and laudable – but if history is
any guide, it is also not sustainable
• The problem is not (merely) the proprietary BIOS – it is the ubiquity of the
abstraction that splits our stack into open and proprietary halves
• The presence of a deeply proprietary platform enablement layer allows
for wildly complicated SoCs to have vast, undocumented elements – the
implementation of the firmware has become the documentation!
• We need a different model
OXIDE
The need for (a to return to) holistic systems
• The platform enablement boundary as we know it today is largely
vestigial – it serves to create abstractions that are broadly unnecessary
• We need systems that obliterate these boundaries – that are rather
holistic systems in which software and hardware are co-designed
• Resetting system state over the course of booting is not holistic!
• Holistic systems require us to be willing to take up Roscoe’s challenge
and adopt SoC specificity in our operating systems
OXIDE
Oxide’s approach
• At Oxide, we are taking a from-scratch, rack-scale approach to
server-side computing, with AMD Milan-based sleds of our own design
• We do not have a traditional BMC, but rather a fit-to-purpose service
processor (an STM32H753) and RoT (LP55S28), both running our own
(Rust-based, open source) OS, Hubris (see Cliff Biffle’s OSFC 2021 talk!)
• Our approach is holistic but open
• Could we develop a truly holistic system on x86?
OXIDE
Aside: AMD Details
• On AMD, the Platform Security Processor (PSP) is a non-architectural
core that executes proprietary software to perform system initialization –
including DRAM training
• System management controller (SP in our case) puts the PSP payload
into SPI flash and brings the CPU out of reset
• The PSP will perform its initialization and eventually vector into host
software executing on the bootstrap core (BSC)
• Historically, post-PSP initialization done by AMD’s AGESA firmware –
which makes a holistic system impossible
OXIDE
Challenge #1: Initialization
• To implement holistic boot, system software must perform the activities
historically done by AGESA
• Modern CPUs are very complicated! Post-PSP initialization includes
configuring I/O interconnects, core complexes, etc.
• For AMD Milan, this specifically includes DXIO engine configuration,
NBIO PCIe strapping, hotplug configuration
• The software that has implemented this level of initialization has
historically been done by the CPU vendor; these interfaces are not
always documented thoroughly – if at all!
OXIDE
Challenge #2: Boot Phasing
• Payload that boots from PSP is size-constrained to ~13MB
• Stage-based approaches (e.g., oreboot + LinuxBoot) use Linux drivers
to load (and execute) a production kernel
• This necessitates a pseudo-reset of the system – as well as the creation
or emulation of an interface (e.g., ACPI) to pass system state to later
stages
• We instead adopt a phase-based approach whereby part of the
system is loaded from SPI NOR and is able to load the remainder from
SSDs – but the system is never discarded
OXIDE
Holistic booting!
• Helios is our illumos derivative that includes the Oxide bhyve-based
hypervisor – and runs our rack-wide control plane
• We have holistic Helios booting on our EVT compute sleds, including all
necessary functionality for platform initialization (I/O, SMP, etc.)
• Phased boot has enough in SPI to be able to import ZFS pools from M.2
devices
• Helios – along with all Oxide-authored software – will be open source
when we ship our first racks at the end of the year!
OXIDE
Towards holistic systems
• Holistic systems have clear advantages in terms of reliability, security,
observability, manageability, sustainability, etc.
• Based on our experience to date, holistic systems are challenging to
implement but emphatically attainable
• Documentation from microprocessor vendors is essential; they
have much to gain by encouraging more software on their platforms!
• Oxide may represent the first open, holistic server-side system in the
post-PC x86 era – but unlikely to be the last!

More Related Content

What's hot

Moving a Monolith to Kubernetes
Moving a Monolith to KubernetesMoving a Monolith to Kubernetes
Moving a Monolith to Kubernetes
M. Scott Ford
 
Docker 101 - Nov 2016
Docker 101 - Nov 2016Docker 101 - Nov 2016
Docker 101 - Nov 2016
Docker, Inc.
 
Linux Module Programming
Linux Module ProgrammingLinux Module Programming
Linux Module Programming
Amir Payberah
 
Red Hat OpenStack 17 저자직강+스터디그룹_5주차
Red Hat OpenStack 17 저자직강+스터디그룹_5주차Red Hat OpenStack 17 저자직강+스터디그룹_5주차
Red Hat OpenStack 17 저자직강+스터디그룹_5주차
Nalee Jang
 
Traffic Control with Envoy Proxy
Traffic Control with Envoy ProxyTraffic Control with Envoy Proxy
Traffic Control with Envoy Proxy
Mark McBride
 
Webinar "Introduction to OpenStack"
Webinar "Introduction to OpenStack"Webinar "Introduction to OpenStack"
Webinar "Introduction to OpenStack"
CREATE-NET
 
[OpenStack] 공개 소프트웨어 오픈스택 입문 & 파헤치기
[OpenStack] 공개 소프트웨어 오픈스택 입문 & 파헤치기[OpenStack] 공개 소프트웨어 오픈스택 입문 & 파헤치기
[OpenStack] 공개 소프트웨어 오픈스택 입문 & 파헤치기
Ian Choi
 
Gocd – Kubernetes/Nomad Continuous Deployment
Gocd – Kubernetes/Nomad Continuous DeploymentGocd – Kubernetes/Nomad Continuous Deployment
Gocd – Kubernetes/Nomad Continuous Deployment
Leandro Totino Pereira
 
Filesystem Comparison: NFS vs GFS2 vs OCFS2
Filesystem Comparison: NFS vs GFS2 vs OCFS2Filesystem Comparison: NFS vs GFS2 vs OCFS2
Filesystem Comparison: NFS vs GFS2 vs OCFS2
Giuseppe Paterno'
 
[232] 성능어디까지쥐어짜봤니 송태웅
[232] 성능어디까지쥐어짜봤니 송태웅[232] 성능어디까지쥐어짜봤니 송태웅
[232] 성능어디까지쥐어짜봤니 송태웅
NAVER D2
 
OpenShift Virtualization- Technical Overview.pdf
OpenShift Virtualization- Technical Overview.pdfOpenShift Virtualization- Technical Overview.pdf
OpenShift Virtualization- Technical Overview.pdf
ssuser1490e8
 
Pod Security AdmissionによるKubernetesのポリシー制御(Kubernetes Novice Tokyo #21 発表資料)
Pod Security AdmissionによるKubernetesのポリシー制御(Kubernetes Novice Tokyo #21 発表資料)Pod Security AdmissionによるKubernetesのポリシー制御(Kubernetes Novice Tokyo #21 発表資料)
Pod Security AdmissionによるKubernetesのポリシー制御(Kubernetes Novice Tokyo #21 発表資料)
NTT DATA Technology & Innovation
 
Installing Aix
Installing AixInstalling Aix
Installing Aix
Frederick James Rathweg
 
Red Hat OpenStack 17 저자직강+스터디그룹_1주차
Red Hat OpenStack 17 저자직강+스터디그룹_1주차Red Hat OpenStack 17 저자직강+스터디그룹_1주차
Red Hat OpenStack 17 저자직강+스터디그룹_1주차
Nalee Jang
 
Introduction to openshift
Introduction to openshiftIntroduction to openshift
Introduction to openshift
MamathaBusi
 
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...
DoKC
 
Qnx os
Qnx os Qnx os
Qnx os
Student
 
01. Kubernetes-PPT.pptx
01. Kubernetes-PPT.pptx01. Kubernetes-PPT.pptx
01. Kubernetes-PPT.pptx
TamalBanerjee16
 
PubSub and Notifications in Ceph
PubSub and Notifications in CephPubSub and Notifications in Ceph
PubSub and Notifications in Ceph
Yuval Lifshitz
 
Making Linux do Hard Real-time
Making Linux do Hard Real-timeMaking Linux do Hard Real-time
Making Linux do Hard Real-time
National Cheng Kung University
 

What's hot (20)

Moving a Monolith to Kubernetes
Moving a Monolith to KubernetesMoving a Monolith to Kubernetes
Moving a Monolith to Kubernetes
 
Docker 101 - Nov 2016
Docker 101 - Nov 2016Docker 101 - Nov 2016
Docker 101 - Nov 2016
 
Linux Module Programming
Linux Module ProgrammingLinux Module Programming
Linux Module Programming
 
Red Hat OpenStack 17 저자직강+스터디그룹_5주차
Red Hat OpenStack 17 저자직강+스터디그룹_5주차Red Hat OpenStack 17 저자직강+스터디그룹_5주차
Red Hat OpenStack 17 저자직강+스터디그룹_5주차
 
Traffic Control with Envoy Proxy
Traffic Control with Envoy ProxyTraffic Control with Envoy Proxy
Traffic Control with Envoy Proxy
 
Webinar "Introduction to OpenStack"
Webinar "Introduction to OpenStack"Webinar "Introduction to OpenStack"
Webinar "Introduction to OpenStack"
 
[OpenStack] 공개 소프트웨어 오픈스택 입문 & 파헤치기
[OpenStack] 공개 소프트웨어 오픈스택 입문 & 파헤치기[OpenStack] 공개 소프트웨어 오픈스택 입문 & 파헤치기
[OpenStack] 공개 소프트웨어 오픈스택 입문 & 파헤치기
 
Gocd – Kubernetes/Nomad Continuous Deployment
Gocd – Kubernetes/Nomad Continuous DeploymentGocd – Kubernetes/Nomad Continuous Deployment
Gocd – Kubernetes/Nomad Continuous Deployment
 
Filesystem Comparison: NFS vs GFS2 vs OCFS2
Filesystem Comparison: NFS vs GFS2 vs OCFS2Filesystem Comparison: NFS vs GFS2 vs OCFS2
Filesystem Comparison: NFS vs GFS2 vs OCFS2
 
[232] 성능어디까지쥐어짜봤니 송태웅
[232] 성능어디까지쥐어짜봤니 송태웅[232] 성능어디까지쥐어짜봤니 송태웅
[232] 성능어디까지쥐어짜봤니 송태웅
 
OpenShift Virtualization- Technical Overview.pdf
OpenShift Virtualization- Technical Overview.pdfOpenShift Virtualization- Technical Overview.pdf
OpenShift Virtualization- Technical Overview.pdf
 
Pod Security AdmissionによるKubernetesのポリシー制御(Kubernetes Novice Tokyo #21 発表資料)
Pod Security AdmissionによるKubernetesのポリシー制御(Kubernetes Novice Tokyo #21 発表資料)Pod Security AdmissionによるKubernetesのポリシー制御(Kubernetes Novice Tokyo #21 発表資料)
Pod Security AdmissionによるKubernetesのポリシー制御(Kubernetes Novice Tokyo #21 発表資料)
 
Installing Aix
Installing AixInstalling Aix
Installing Aix
 
Red Hat OpenStack 17 저자직강+스터디그룹_1주차
Red Hat OpenStack 17 저자직강+스터디그룹_1주차Red Hat OpenStack 17 저자직강+스터디그룹_1주차
Red Hat OpenStack 17 저자직강+스터디그룹_1주차
 
Introduction to openshift
Introduction to openshiftIntroduction to openshift
Introduction to openshift
 
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...
Running PostgreSQL in Kubernetes: from day 0 to day 2 with CloudNativePG - Do...
 
Qnx os
Qnx os Qnx os
Qnx os
 
01. Kubernetes-PPT.pptx
01. Kubernetes-PPT.pptx01. Kubernetes-PPT.pptx
01. Kubernetes-PPT.pptx
 
PubSub and Notifications in Ceph
PubSub and Notifications in CephPubSub and Notifications in Ceph
PubSub and Notifications in Ceph
 
Making Linux do Hard Real-time
Making Linux do Hard Real-timeMaking Linux do Hard Real-time
Making Linux do Hard Real-time
 

Similar to I have come to bury the BIOS, not to open it: The need for holistic systems

Opening last bits of the infrastructure
Opening last bits of the infrastructureOpening last bits of the infrastructure
Opening last bits of the infrastructure
Erwan Velu
 
Building Embedded Linux Systems Introduction
Building Embedded Linux Systems IntroductionBuilding Embedded Linux Systems Introduction
Building Embedded Linux Systems Introduction
Sherif Mousa
 
The linux kernel hidden inside windows 10
The linux kernel hidden inside windows 10The linux kernel hidden inside windows 10
The linux kernel hidden inside windows 10
mark-smith
 
Linux booting process, Dual booting, Components involved
Linux booting process, Dual booting, Components involvedLinux booting process, Dual booting, Components involved
Linux booting process, Dual booting, Components involved
divyammo
 
Regarding About Operating System Structure
Regarding About Operating System StructureRegarding About Operating System Structure
Regarding About Operating System Structure
sankarkvdc
 
One Shellcode to Rule Them All: Cross-Platform Exploitation
One Shellcode to Rule Them All: Cross-Platform ExploitationOne Shellcode to Rule Them All: Cross-Platform Exploitation
One Shellcode to Rule Them All: Cross-Platform Exploitation
Quinn Wilton
 
Introducing Plan9 from Bell Labs
Introducing Plan9 from Bell LabsIntroducing Plan9 from Bell Labs
Introducing Plan9 from Bell Labs
Anant Narayanan
 
Overview.ppt
Overview.pptOverview.ppt
Overview.ppt
shruti533256
 
Embedded system - embedded system programming
Embedded system - embedded system programmingEmbedded system - embedded system programming
Embedded system - embedded system programming
Vibrant Technologies & Computers
 
Manta: a new internet-facing object storage facility that features compute by...
Manta: a new internet-facing object storage facility that features compute by...Manta: a new internet-facing object storage facility that features compute by...
Manta: a new internet-facing object storage facility that features compute by...
Hakka Labs
 
Embedded linux
Embedded linuxEmbedded linux
Embedded linux
Wingston
 
10. compute-part-2
10. compute-part-210. compute-part-2
10. compute-part-2
Muhammad Ahad
 
The end of embedded Linux (as we know it)
The end of embedded Linux (as we know it)The end of embedded Linux (as we know it)
The end of embedded Linux (as we know it)
Chris Simmonds
 
OpenIO Summit'17 - ARM, Object Storage and more
OpenIO Summit'17 - ARM, Object Storage and moreOpenIO Summit'17 - ARM, Object Storage and more
OpenIO Summit'17 - ARM, Object Storage and more
OpenIO Object Storage
 
NXP IMX6 Processor - Embedded Linux
NXP IMX6 Processor - Embedded LinuxNXP IMX6 Processor - Embedded Linux
NXP IMX6 Processor - Embedded Linux
NEEVEE Technologies
 
Linux for embedded_systems
Linux for embedded_systemsLinux for embedded_systems
Linux for embedded_systems
Vandana Salve
 
Linux操作系统01 简介
Linux操作系统01 简介Linux操作系统01 简介
Linux操作系统01 简介
lclsg123
 
Unix operating system
Unix operating systemUnix operating system
Unix operating system
ABhay Panchal
 
RISC-V on Edge: Porting EVE and Alpine Linux to RISC-V
RISC-V on Edge: Porting EVE and Alpine Linux to RISC-VRISC-V on Edge: Porting EVE and Alpine Linux to RISC-V
RISC-V on Edge: Porting EVE and Alpine Linux to RISC-V
ScyllaDB
 
Window ce
Window ceWindow ce

Similar to I have come to bury the BIOS, not to open it: The need for holistic systems (20)

Opening last bits of the infrastructure
Opening last bits of the infrastructureOpening last bits of the infrastructure
Opening last bits of the infrastructure
 
Building Embedded Linux Systems Introduction
Building Embedded Linux Systems IntroductionBuilding Embedded Linux Systems Introduction
Building Embedded Linux Systems Introduction
 
The linux kernel hidden inside windows 10
The linux kernel hidden inside windows 10The linux kernel hidden inside windows 10
The linux kernel hidden inside windows 10
 
Linux booting process, Dual booting, Components involved
Linux booting process, Dual booting, Components involvedLinux booting process, Dual booting, Components involved
Linux booting process, Dual booting, Components involved
 
Regarding About Operating System Structure
Regarding About Operating System StructureRegarding About Operating System Structure
Regarding About Operating System Structure
 
One Shellcode to Rule Them All: Cross-Platform Exploitation
One Shellcode to Rule Them All: Cross-Platform ExploitationOne Shellcode to Rule Them All: Cross-Platform Exploitation
One Shellcode to Rule Them All: Cross-Platform Exploitation
 
Introducing Plan9 from Bell Labs
Introducing Plan9 from Bell LabsIntroducing Plan9 from Bell Labs
Introducing Plan9 from Bell Labs
 
Overview.ppt
Overview.pptOverview.ppt
Overview.ppt
 
Embedded system - embedded system programming
Embedded system - embedded system programmingEmbedded system - embedded system programming
Embedded system - embedded system programming
 
Manta: a new internet-facing object storage facility that features compute by...
Manta: a new internet-facing object storage facility that features compute by...Manta: a new internet-facing object storage facility that features compute by...
Manta: a new internet-facing object storage facility that features compute by...
 
Embedded linux
Embedded linuxEmbedded linux
Embedded linux
 
10. compute-part-2
10. compute-part-210. compute-part-2
10. compute-part-2
 
The end of embedded Linux (as we know it)
The end of embedded Linux (as we know it)The end of embedded Linux (as we know it)
The end of embedded Linux (as we know it)
 
OpenIO Summit'17 - ARM, Object Storage and more
OpenIO Summit'17 - ARM, Object Storage and moreOpenIO Summit'17 - ARM, Object Storage and more
OpenIO Summit'17 - ARM, Object Storage and more
 
NXP IMX6 Processor - Embedded Linux
NXP IMX6 Processor - Embedded LinuxNXP IMX6 Processor - Embedded Linux
NXP IMX6 Processor - Embedded Linux
 
Linux for embedded_systems
Linux for embedded_systemsLinux for embedded_systems
Linux for embedded_systems
 
Linux操作系统01 简介
Linux操作系统01 简介Linux操作系统01 简介
Linux操作系统01 简介
 
Unix operating system
Unix operating systemUnix operating system
Unix operating system
 
RISC-V on Edge: Porting EVE and Alpine Linux to RISC-V
RISC-V on Edge: Porting EVE and Alpine Linux to RISC-VRISC-V on Edge: Porting EVE and Alpine Linux to RISC-V
RISC-V on Edge: Porting EVE and Alpine Linux to RISC-V
 
Window ce
Window ceWindow ce
Window ce
 

More from bcantrill

Predicting the Present
Predicting the PresentPredicting the Present
Predicting the Present
bcantrill
 
Sharpening the Axe: The Primacy of Toolmaking
Sharpening the Axe: The Primacy of ToolmakingSharpening the Axe: The Primacy of Toolmaking
Sharpening the Axe: The Primacy of Toolmaking
bcantrill
 
Hardware/software Co-design: The Coming Golden Age
Hardware/software Co-design: The Coming Golden AgeHardware/software Co-design: The Coming Golden Age
Hardware/software Co-design: The Coming Golden Age
bcantrill
 
Tockilator: Deducing Tock execution flows from Ibex Verilator traces
Tockilator: Deducing Tock execution flows from Ibex Verilator tracesTockilator: Deducing Tock execution flows from Ibex Verilator traces
Tockilator: Deducing Tock execution flows from Ibex Verilator traces
bcantrill
 
No Moore Left to Give: Enterprise Computing After Moore's Law
No Moore Left to Give: Enterprise Computing After Moore's LawNo Moore Left to Give: Enterprise Computing After Moore's Law
No Moore Left to Give: Enterprise Computing After Moore's Law
bcantrill
 
Andreessen's Corollary: Ethical Dilemmas in Software Engineering
Andreessen's Corollary: Ethical Dilemmas in Software EngineeringAndreessen's Corollary: Ethical Dilemmas in Software Engineering
Andreessen's Corollary: Ethical Dilemmas in Software Engineering
bcantrill
 
Visualizing Systems with Statemaps
Visualizing Systems with StatemapsVisualizing Systems with Statemaps
Visualizing Systems with Statemaps
bcantrill
 
Platform values, Rust, and the implications for system software
Platform values, Rust, and the implications for system softwarePlatform values, Rust, and the implications for system software
Platform values, Rust, and the implications for system software
bcantrill
 
dtrace.conf(16): DTrace state of the union
dtrace.conf(16): DTrace state of the uniondtrace.conf(16): DTrace state of the union
dtrace.conf(16): DTrace state of the union
bcantrill
 
The Hurricane's Butterfly: Debugging pathologically performing systems
The Hurricane's Butterfly: Debugging pathologically performing systemsThe Hurricane's Butterfly: Debugging pathologically performing systems
The Hurricane's Butterfly: Debugging pathologically performing systems
bcantrill
 
Papers We Love: ARC after dark
Papers We Love: ARC after darkPapers We Love: ARC after dark
Papers We Love: ARC after dark
bcantrill
 
Principles of Technology Leadership
Principles of Technology LeadershipPrinciples of Technology Leadership
Principles of Technology Leadership
bcantrill
 
Zebras all the way down: The engineering challenges of the data path
Zebras all the way down: The engineering challenges of the data pathZebras all the way down: The engineering challenges of the data path
Zebras all the way down: The engineering challenges of the data path
bcantrill
 
Platform as reflection of values: Joyent, node.js, and beyond
Platform as reflection of values: Joyent, node.js, and beyondPlatform as reflection of values: Joyent, node.js, and beyond
Platform as reflection of values: Joyent, node.js, and beyond
bcantrill
 
Debugging under fire: Keeping your head when systems have lost their mind
Debugging under fire: Keeping your head when systems have lost their mindDebugging under fire: Keeping your head when systems have lost their mind
Debugging under fire: Keeping your head when systems have lost their mind
bcantrill
 
Down Memory Lane: Two Decades with the Slab Allocator
Down Memory Lane: Two Decades with the Slab AllocatorDown Memory Lane: Two Decades with the Slab Allocator
Down Memory Lane: Two Decades with the Slab Allocator
bcantrill
 
The State of Cloud 2016: The whirlwind of creative destruction
The State of Cloud 2016: The whirlwind of creative destructionThe State of Cloud 2016: The whirlwind of creative destruction
The State of Cloud 2016: The whirlwind of creative destruction
bcantrill
 
Oral tradition in software engineering: Passing the craft across generations
Oral tradition in software engineering: Passing the craft across generationsOral tradition in software engineering: Passing the craft across generations
Oral tradition in software engineering: Passing the craft across generations
bcantrill
 
The Container Revolution: Reflections after the first decade
The Container Revolution: Reflections after the first decadeThe Container Revolution: Reflections after the first decade
The Container Revolution: Reflections after the first decade
bcantrill
 
Debugging (Docker) containers in production
Debugging (Docker) containers in productionDebugging (Docker) containers in production
Debugging (Docker) containers in production
bcantrill
 

More from bcantrill (20)

Predicting the Present
Predicting the PresentPredicting the Present
Predicting the Present
 
Sharpening the Axe: The Primacy of Toolmaking
Sharpening the Axe: The Primacy of ToolmakingSharpening the Axe: The Primacy of Toolmaking
Sharpening the Axe: The Primacy of Toolmaking
 
Hardware/software Co-design: The Coming Golden Age
Hardware/software Co-design: The Coming Golden AgeHardware/software Co-design: The Coming Golden Age
Hardware/software Co-design: The Coming Golden Age
 
Tockilator: Deducing Tock execution flows from Ibex Verilator traces
Tockilator: Deducing Tock execution flows from Ibex Verilator tracesTockilator: Deducing Tock execution flows from Ibex Verilator traces
Tockilator: Deducing Tock execution flows from Ibex Verilator traces
 
No Moore Left to Give: Enterprise Computing After Moore's Law
No Moore Left to Give: Enterprise Computing After Moore's LawNo Moore Left to Give: Enterprise Computing After Moore's Law
No Moore Left to Give: Enterprise Computing After Moore's Law
 
Andreessen's Corollary: Ethical Dilemmas in Software Engineering
Andreessen's Corollary: Ethical Dilemmas in Software EngineeringAndreessen's Corollary: Ethical Dilemmas in Software Engineering
Andreessen's Corollary: Ethical Dilemmas in Software Engineering
 
Visualizing Systems with Statemaps
Visualizing Systems with StatemapsVisualizing Systems with Statemaps
Visualizing Systems with Statemaps
 
Platform values, Rust, and the implications for system software
Platform values, Rust, and the implications for system softwarePlatform values, Rust, and the implications for system software
Platform values, Rust, and the implications for system software
 
dtrace.conf(16): DTrace state of the union
dtrace.conf(16): DTrace state of the uniondtrace.conf(16): DTrace state of the union
dtrace.conf(16): DTrace state of the union
 
The Hurricane's Butterfly: Debugging pathologically performing systems
The Hurricane's Butterfly: Debugging pathologically performing systemsThe Hurricane's Butterfly: Debugging pathologically performing systems
The Hurricane's Butterfly: Debugging pathologically performing systems
 
Papers We Love: ARC after dark
Papers We Love: ARC after darkPapers We Love: ARC after dark
Papers We Love: ARC after dark
 
Principles of Technology Leadership
Principles of Technology LeadershipPrinciples of Technology Leadership
Principles of Technology Leadership
 
Zebras all the way down: The engineering challenges of the data path
Zebras all the way down: The engineering challenges of the data pathZebras all the way down: The engineering challenges of the data path
Zebras all the way down: The engineering challenges of the data path
 
Platform as reflection of values: Joyent, node.js, and beyond
Platform as reflection of values: Joyent, node.js, and beyondPlatform as reflection of values: Joyent, node.js, and beyond
Platform as reflection of values: Joyent, node.js, and beyond
 
Debugging under fire: Keeping your head when systems have lost their mind
Debugging under fire: Keeping your head when systems have lost their mindDebugging under fire: Keeping your head when systems have lost their mind
Debugging under fire: Keeping your head when systems have lost their mind
 
Down Memory Lane: Two Decades with the Slab Allocator
Down Memory Lane: Two Decades with the Slab AllocatorDown Memory Lane: Two Decades with the Slab Allocator
Down Memory Lane: Two Decades with the Slab Allocator
 
The State of Cloud 2016: The whirlwind of creative destruction
The State of Cloud 2016: The whirlwind of creative destructionThe State of Cloud 2016: The whirlwind of creative destruction
The State of Cloud 2016: The whirlwind of creative destruction
 
Oral tradition in software engineering: Passing the craft across generations
Oral tradition in software engineering: Passing the craft across generationsOral tradition in software engineering: Passing the craft across generations
Oral tradition in software engineering: Passing the craft across generations
 
The Container Revolution: Reflections after the first decade
The Container Revolution: Reflections after the first decadeThe Container Revolution: Reflections after the first decade
The Container Revolution: Reflections after the first decade
 
Debugging (Docker) containers in production
Debugging (Docker) containers in productionDebugging (Docker) containers in production
Debugging (Docker) containers in production
 

Recently uploaded

LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
DanBrown980551
 
Christine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptxChristine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptx
christinelarrosa
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
Antonios Katsarakis
 
Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
Safe Software
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
Alex Pruden
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
High performance Serverless Java on AWS- GoTo Amsterdam 2024
High performance Serverless Java on AWS- GoTo Amsterdam 2024High performance Serverless Java on AWS- GoTo Amsterdam 2024
High performance Serverless Java on AWS- GoTo Amsterdam 2024
Vadym Kazulkin
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
Fwdays
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
Tatiana Kojar
 
What is an RPA CoE? Session 1 – CoE Vision
What is an RPA CoE?  Session 1 – CoE VisionWhat is an RPA CoE?  Session 1 – CoE Vision
What is an RPA CoE? Session 1 – CoE Vision
DianaGray10
 
Demystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through StorytellingDemystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through Storytelling
Enterprise Knowledge
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
AstuteBusiness
 
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
Neo4j
 
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Neo4j
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
Neo4j
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
operationspcvita
 

Recently uploaded (20)

LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
 
Christine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptxChristine's Supplier Sourcing Presentaion.pptx
Christine's Supplier Sourcing Presentaion.pptx
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
 
Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
 
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
High performance Serverless Java on AWS- GoTo Amsterdam 2024
High performance Serverless Java on AWS- GoTo Amsterdam 2024High performance Serverless Java on AWS- GoTo Amsterdam 2024
High performance Serverless Java on AWS- GoTo Amsterdam 2024
 
"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota"Choosing proper type of scaling", Olena Syrota
"Choosing proper type of scaling", Olena Syrota
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
 
What is an RPA CoE? Session 1 – CoE Vision
What is an RPA CoE?  Session 1 – CoE VisionWhat is an RPA CoE?  Session 1 – CoE Vision
What is an RPA CoE? Session 1 – CoE Vision
 
Demystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through StorytellingDemystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through Storytelling
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |Astute Business Solutions | Oracle Cloud Partner |
Astute Business Solutions | Oracle Cloud Partner |
 
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
 
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
 
Leveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and StandardsLeveraging the Graph for Clinical Trials and Standards
Leveraging the Graph for Clinical Trials and Standards
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
 

I have come to bury the BIOS, not to open it: The need for holistic systems

  • 1. I have come to bury the BIOS, not to open it The need for holistic systems Bryan Cantrill Oxide Computer Company
  • 2. OXIDE In the beginning… • In the beginning, computing systems were holistic: hardware and software were designed together, to work with one another • System software was delivered with the computer – but often had to be newly developed for a new machine • Requiring new software for new hardware created schedule delays, viz. OS/360 and The Mythical Man Month • With the advent of Unix, system software became portable – it could be ported rather than developed de novo for each new computer…
  • 3. OXIDE Unix spreads – and feuds • The portability of Unix accelerated the minicomputer and workstation revolutions, with each manufacturer having its own variants • The systems of this era remained broadly holistic: the hardware and software were (broadly) designed with the other in mind • …but despite the original ethos of Unix, the variants themselves remained entirely proprietary – and the differences between them ignited the Unix Wars of the 1980s and 1990s
  • 4. OXIDE Elsewhere, homebrew computing • With the rise of the microcomputer, computing became much more broadly available in the 1970s – but nearly absurd variety with respect to hardware made software standardization challenging • The hardware-specific half of CP/M – the dominant microcomputer OS of the 1970 – was the Basic Input Output System, and could be delivered separately • This gave rise to hardware vendors delivering ROMs that contained platform enablement code roughly standardized as a “System BIOS”
  • 5. OXIDE The IBM PC era • With the emergence of the IBM PC – and its de facto standardization by Compaq – the system software/BIOS split became irreconcilable • Essential hardware enabling-software was driven into the BIOS • The BIOS interface became what system software bound to – it became the definition of “compatibility” • Worse, the software components on both sides of the BIOS/OS divide were nearly exclusively proprietary, serving to harden the boundary
  • 6. OXIDE It gets worse: SMM • In order to be able implement system software functionality delivered by the hardware – e.g., laptop suspend and resume – system management mode was invented • SMM allows effectively arbitrary, hidden code execution at arbitrary time without even allowing system software awareness • This is the opposite of a holistic system: it is one that has been deliberately and perniciously divided!
  • 7. OXIDE EFI/UEFI • All of this might have been fine had x86 remained relegated to personal computing… • …but Intel and AMD out-executed the RISC vendors in the 2000s, forcing PC constructs into the server space • Starting with (ill-fated) Itanium, Intel introduced EFI in an attempt to modernize…
  • 8. OXIDE UEFI: What might have been Source: Beyond BIOS: Developing with the Unified Extensible Firmware Interface
  • 9. OXIDE UEFI: What happened instead • While its goals were laudable, UEFI was overconstrained • In particular, the need for legacy and Windows compatibility required UEFI to support all past abstractions • UEFI has become the worst of all worlds: complicated, proprietary software that remains at once isolated from – yet also still entirely entangled with! – system software • UEFI has become so entangled with lowest-level platform enablement that non-UEFI platforms are effectively impossible
  • 10. OXIDE It gets worse, again: Hidden cores • A dividend of Moore’s Law: formerly discrete components were increasingly pulled first into large ASICs – and then pulled on-die into a system-on-a-chip • Especially as I/O was brought directly into the die, CPUs developed an increasing numbers of non-architectural cores to manage it • But these cores are hidden to system software – the operating system is being confined to an increasingly narrow slice of the true hardware capabilities of the system…
  • 11. OXIDE …which is not lost on everyone! Timothy Roscoe, OSDI 2021 Keynote, It's Time for Operating Systems to Rediscover Hardware
  • 12. OXIDE The battle for non-architectural cores • Roscoe (rightfully) calls this a “security catastrophe” • The non-architectural cores are – on x86 CPUs anyway – entirely proprietary, with all of its concomitant problems; that the system is “open source” is increasingly a myth • Roscoe correctly identifies the problem, but understates the severity: this isn’t a retreat of Linux – it is a resurgence of proprietary operating systems, wrapping themselves in firmware
  • 13. OXIDE Is an open source BIOS the answer? • An open source BIOS is certainly valuable and laudable – but if history is any guide, it is also not sustainable • The problem is not (merely) the proprietary BIOS – it is the ubiquity of the abstraction that splits our stack into open and proprietary halves • The presence of a deeply proprietary platform enablement layer allows for wildly complicated SoCs to have vast, undocumented elements – the implementation of the firmware has become the documentation! • We need a different model
  • 14. OXIDE The need for (a to return to) holistic systems • The platform enablement boundary as we know it today is largely vestigial – it serves to create abstractions that are broadly unnecessary • We need systems that obliterate these boundaries – that are rather holistic systems in which software and hardware are co-designed • Resetting system state over the course of booting is not holistic! • Holistic systems require us to be willing to take up Roscoe’s challenge and adopt SoC specificity in our operating systems
  • 15. OXIDE Oxide’s approach • At Oxide, we are taking a from-scratch, rack-scale approach to server-side computing, with AMD Milan-based sleds of our own design • We do not have a traditional BMC, but rather a fit-to-purpose service processor (an STM32H753) and RoT (LP55S28), both running our own (Rust-based, open source) OS, Hubris (see Cliff Biffle’s OSFC 2021 talk!) • Our approach is holistic but open • Could we develop a truly holistic system on x86?
  • 16. OXIDE Aside: AMD Details • On AMD, the Platform Security Processor (PSP) is a non-architectural core that executes proprietary software to perform system initialization – including DRAM training • System management controller (SP in our case) puts the PSP payload into SPI flash and brings the CPU out of reset • The PSP will perform its initialization and eventually vector into host software executing on the bootstrap core (BSC) • Historically, post-PSP initialization done by AMD’s AGESA firmware – which makes a holistic system impossible
  • 17. OXIDE Challenge #1: Initialization • To implement holistic boot, system software must perform the activities historically done by AGESA • Modern CPUs are very complicated! Post-PSP initialization includes configuring I/O interconnects, core complexes, etc. • For AMD Milan, this specifically includes DXIO engine configuration, NBIO PCIe strapping, hotplug configuration • The software that has implemented this level of initialization has historically been done by the CPU vendor; these interfaces are not always documented thoroughly – if at all!
  • 18. OXIDE Challenge #2: Boot Phasing • Payload that boots from PSP is size-constrained to ~13MB • Stage-based approaches (e.g., oreboot + LinuxBoot) use Linux drivers to load (and execute) a production kernel • This necessitates a pseudo-reset of the system – as well as the creation or emulation of an interface (e.g., ACPI) to pass system state to later stages • We instead adopt a phase-based approach whereby part of the system is loaded from SPI NOR and is able to load the remainder from SSDs – but the system is never discarded
  • 19. OXIDE Holistic booting! • Helios is our illumos derivative that includes the Oxide bhyve-based hypervisor – and runs our rack-wide control plane • We have holistic Helios booting on our EVT compute sleds, including all necessary functionality for platform initialization (I/O, SMP, etc.) • Phased boot has enough in SPI to be able to import ZFS pools from M.2 devices • Helios – along with all Oxide-authored software – will be open source when we ship our first racks at the end of the year!
  • 20. OXIDE Towards holistic systems • Holistic systems have clear advantages in terms of reliability, security, observability, manageability, sustainability, etc. • Based on our experience to date, holistic systems are challenging to implement but emphatically attainable • Documentation from microprocessor vendors is essential; they have much to gain by encouraging more software on their platforms! • Oxide may represent the first open, holistic server-side system in the post-PC x86 era – but unlikely to be the last!