SlideShare a Scribd company logo
Hardware/Software Co-Design for Efficient
Microkernel Execution
Martin Děcký
martin.decky@huawei.com
February 2019
Martin Děcký, FOSDEM, February 3rd
2019 Hardware/Software Co-Design for Efficient Microkernel Execution 2
Who Am I
Passionate programmer and operating systems enthusiast
With a specific inclination towards multiserver microkernels
HelenOS developer since 2004
Research Scientist from 2006 to 2018
Charles University (Prague), Distributed Systems Research Group
Senior Research Engineer since 2017
Huawei Technologies (Munich), German Research Center, Central
Software Institute, OS Kernel Lab
3Martin Děcký, FOSDEM, February 3rd
2019 Hardware/Software Co-Design for Efficient Microkernel Execution
Microkernel Multiserver
Systems are better than
Monolithic Systems
3
Martin Děcký, FOSDEM, February 3rd
2019 Hardware/Software Co-Design for Efficient Microkernel Execution 4
Monolithic OS Design is Flawed
Biggs S., Lee D., Heiser G.: The Jury Is In: Monolithic OS Design Is
Flawed: Microkernel-based Designs Improve Security, ACM 9th Asia-
Pacific Workshop on Systems (APSys), 2018
“While intuitive, the benefits of the small TCB have not been quantified to
date. We address this by a study of critical Linux CVEs, where we examine
whether they would be prevented or mitigated by a microkernel-based
design. We find that almost all exploits are at least mitigated to less than
critical severity, and 40 % completely eliminated by an OS design based
on a verified microkernel, such as seL4.”
5Martin Děcký, FOSDEM, February 3rd
2019 Hardware/Software Co-Design for Efficient Microkernel Execution
Problem Statement5
Martin Děcký, FOSDEM, February 3rd
2019 Hardware/Software Co-Design for Efficient Microkernel Execution 6
Problem Statement
Microkernel design ideas go as back as 1969
RC 4000 Multiprogramming System nucleus (Per Brinch Hansen)
Isolation of unprivileged processes, inter-process communication,
hierarchical control
Even after 50 years they are not fully accepted as mainstream
Hardware and software used to be designed independently
Designing CPUs used to be an extremely complicated and costly process
Operating systems used to be written after the CPUs were designed
Hardware designs used to be rather conservative
Martin Děcký, FOSDEM, February 3rd
2019 Hardware/Software Co-Design for Efficient Microkernel Execution 7
Problem Statement (2)
Mainstream ISAs used to be designed in a rather conservative way
Can you name some really revolutionary ISA features since IBM
System/370 Advanced Function?
Requirements on the new ISAs usually follow the needs of the
mainstream operating systems running on the past ISAs
No wonder microkernels suffer performance penalties compared to
monolithic systems
The more fine-grained the architecture, the more penalties it suffers
Let us design the hardware with microkernels in mind!
Martin Děcký, FOSDEM, February 3rd
2019 Hardware/Software Co-Design for Efficient Microkernel Execution 8
The Vicious Cycle
CPUs do not support
microkernels properly
Martin Děcký, FOSDEM, February 3rd
2019 Hardware/Software Co-Design for Efficient Microkernel Execution 9
The Vicious Cycle
CPUs do not support
microkernels properly
Microkernels suffer
perfromance penalties
Martin Děcký, FOSDEM, February 3rd
2019 Hardware/Software Co-Design for Efficient Microkernel Execution 10
The Vicious Cycle
CPUs do not support
microkernels properly
Microkernels are not
in the mainstream
Microkernels suffer
perfromance penalties
Martin Děcký, FOSDEM, February 3rd
2019 Hardware/Software Co-Design for Efficient Microkernel Execution 11
The Vicious Cycle
CPUs do not support
microkernels properly
Microkernels are not
in the mainstream
Microkernels suffer
perfromance penalties
No requirements on
CPUs from microkernels
Martin Děcký, FOSDEM, February 3rd
2019 Hardware/Software Co-Design for Efficient Microkernel Execution 12
The Vicious Cycle
CPUs do not support
microkernels properly
Microkernels are not
in the mainstream
Microkernels suffer
perfromance penalties
No requirements on
CPUs from microkernels
13Martin Děcký, FOSDEM, February 3rd
2019 Hardware/Software Co-Design for Efficient Microkernel Execution
Any Ideas?
Martin Děcký, FOSDEM, February 3rd
2019 Hardware/Software Co-Design for Efficient Microkernel Execution 14
Communication between Address Spaces
Control and data flow between subsystems
Monolithic kernel
Function calls
Passing arguments in registers and on the stack
Passing direct pointers to memory structures
Multiserver microkernel
IPC via microkernel syscalls
Passing arguments in a subset of registers
Privilege level switch, address space switch
Scheduling (in case of asynchronous IPC)
Data copying or memory sharing with page granularity
Martin Děcký, FOSDEM, February 3rd
2019 Hardware/Software Co-Design for Efficient Microkernel Execution 15
Communication between Address Spaces (2)
Is the kernel round-trip of the IPC necessary?
Suggestion for synchronous IPC: Extended Jump/Call and Return instructions
that also switch the address space
Communicating parties identified by a “call gate” (capability) containing the target
address space and the PC of the IPC handler (implicit for return)
Call gates stored in a TLB-like hardware cache (CLB)
CLB populated by the microkernel similarly to TLB-only memory management
architecture
Suggestion for asynchronous IPC: Using CPU cache lines as the buffers for the
messages
Async Jump/Call, Async Return and Async Receive instructions
Using the CPU cache like an extended register stack engine
Martin Děcký, FOSDEM, February 3rd
2019 Hardware/Software Co-Design for Efficient Microkernel Execution 16
Communication between Address Spaces (3)
Bulk data
Observation: Memory sharing is actually quite efficient for large amounts
of data (multiple pages)
Overhead is caused primarily by creating and tearing down the shared
pages
Data needs to be page-aligned
Sub-page granularity and dynamic data structures
Suggestion: Using CPU cache lines as shared buffers
Much finer granularity than pages (typically 64 to 128 bytes)
A separate virtual-to-cache mapping mechanism before the standard
virtual-to-physical mapping
Martin Děcký, FOSDEM, February 3rd
2019 Hardware/Software Co-Design for Efficient Microkernel Execution 17
Fast Context Switching
Current microsecond-scale latency hiding mechanisms
Hardware multi-threading
Effective
Does not scale beyond a few threads
Operating system context switching
Scales for any thread count
Too slow (order of 10 µs)
Goal: Finding a sweet spot between the two mechanisms
Martin Děcký, FOSDEM, February 3rd
2019 Hardware/Software Co-Design for Efficient Microkernel Execution 18
Fast Context Switching (2)
Suggestion: Hardware cache for contexts
Again, similar mechanism to TLB-only memory management
Dedicated instructions for context store, context restore, context switch, context
save, context load
Context data could be potentially ABI-optimized
Autonomous mechanism for event-triggered context switch (e.g. external
interrupt)
Efficient hardware mechanism for latency hiding
The equivalent of fine/coarse-grained simultaneous multithreading
The software scheduler is in charge of setting the scheduler policy
The CPU is in charge of scheduling the contexts based on ALU, cache and other resource
availability
Martin Děcký, FOSDEM, February 3rd
2019 Hardware/Software Co-Design for Efficient Microkernel Execution 19
User Space Interrupt Processing
Extension of the fast context switching mechanism
Efficient delivery of interrupt events to user space device drivers
Without the routine microkernel intervention
An interrupt could be directly handled by a preconfigured hardware context in
user space
A clear path towards moving even the timer interrupt handler and the scheduler from
kernel space to user space
Going back to interrupt-driven handling of peripherals with extreme low latency
requirements (instead of polling)
The usual pain point: Level-triggered interrupts
Some coordination with the platform interrupt controller is probably needed
to automatically mask the interrupt source
Martin Děcký, FOSDEM, February 3rd
2019 Hardware/Software Co-Design for Efficient Microkernel Execution 20
Capabilities as First-Class Entities
Capabilities as unforgeable object identifiers
But eventually each access to an object needs to be bound-checked and
translated into the (flat) virtual address space
Suggestion: Embedding the capability reference in pointers
RV128 (128-bit variant of RISC-V) would provide 64 bits for the capability
reference and 64 bits for object offset
128-bit flat pointers are probably useless anyway
Besides the (somewhat narrow) use in the microkernel, this could be useful
for other purposes
Simplifying the implementation of managed languages’ VMs
Working with multiple virtual address spaces at once
Martin Děcký, FOSDEM, February 3rd
2019 Hardware/Software Co-Design for Efficient Microkernel Execution 21
Prior Art
Nordström S., Lindh L., Johansson L., Skoglund T.: Application Specific
Real-Time Microkernel in Hardware, 14th IEEE-NPSS Real Time
Conference, 2005
Offloading basic microkernel operations (e.g. thread creation, context
switching) to hardware shown to improve performance by 15 % on
average and up to 73 %
This was a coarse-grained approach
Hardware message passing in Intel SCC and Tilera TILE-G64/TILE-
Pro64
Asynchronous message passing with tight software integration
Martin Děcký, FOSDEM, February 3rd
2019 Hardware/Software Co-Design for Efficient Microkernel Execution 22
Prior Art (2)
Hajj I. E,, Merritt A., Zellweger G., Milojicic D., Achermann R., Faraboschi
P., Hwu W., Roscoe T., Schwan K.: SpaceJMP: Programming with Multiple
Virtual Address Spaces, 21st ACM ASPLOS, 2016
Practical programming model for using multiple virtual address spaces on
commodity hardware (evaluated on DragonFly BSD and Barrelfish)
Useful for data-centric applications for sharing large amounts of memory between
processes
Intel IA-32 Task State Segment (TSS)
Hardware-based context switching
Historically, it has been used by Linux
The primary reason for removal was not performance, but portability
Martin Děcký, FOSDEM, February 3rd
2019 Hardware/Software Co-Design for Efficient Microkernel Execution 23
Prior Art (3)
Intel VT-x VM Functions (VMFUNC)
Efficient cross-VM function calls
Switching the EPT and passing register arguments
Current implementation limited to 512 entry points
Practically usable even for very fine-grained virtualization with the
granularity of individual functions
Liu Y., Zhou T., Chen K., Chen H., Xia Y.: Thwarting Memory Disclosure with
Efficient Hypervisor-enforced Intra-domain Isolation, 22nd ACM SIGSAC
Conference on Computer and Communications Security, 2015
– “The cost of a VMFUNC is similar with a syscall”
– “… hypervisor-level protection at the cost of system calls”
SkyBridge paper to appear at EuroSys 2019
Martin Děcký, FOSDEM, February 3rd
2019 Hardware/Software Co-Design for Efficient Microkernel Execution 24
Prior Art (4)
Woodruff J., Watson R. N. M., Chisnall D., Moore S., Anderson J., Davis B., Laurie
B., Neumann P. G., Norton R., Roe. M.: The CHERI capability model: Revisiting RISC
in the an age of risk, 41st ACM Annual International Symposium on Computer
Architecture, 2014
Hardware-based capability model for byte-granularity memory protection
Extension of the 64-bit MIPS ISA
Evaluated on an extended MIPS R4000 FPGA soft-core
32 capability registers (256 bits)
Limitation: Inflexible design mostly due to the tight backward compatibility with a 64-bit
ISA
Intel MPX
Several design and implementation issues, deemed not production-ready
Martin Děcký, FOSDEM, February 3rd
2019 Hardware/Software Co-Design for Efficient Microkernel Execution 25
Summary
Traditionally, hardware has not been designed to accommodate the
requirements of microkernel multiserver operating systems
Microkernels thus suffer performance penalties
This prevented them from replacing monolithic operating systems and closed
the vicious cycle
Hardware design is hopefully becoming more accessible and democratic
E.g. RISC-V
Co-designing the hardware and software might help us gain the benefits
of the microkernel multiserver design with no performance penalties
However, it requires some out-of-the-box thinking
Martin Děcký, FOSDEM, February 3rd
2019 Hardware/Software Co-Design for Efficient Microkernel Execution 26
Acknowledgements
OS Kernel Lab at Huawei Technologies
Javier Picorel
Haibo Chen
Martin Děcký, FOSDEM, February 3rd
2019 Hardware/Software Co-Design for Efficient Microkernel Execution 27
Huawei Dresden R&D Lab
Focusing on microkernel research, design and development
Basic research
Applied research
Prototype development
Collaboration with academia and other technology companies
Looking for senior operating system researchers, designers, developers and
experts
Previous microkernel experience is a big plus
“A startup within a large company”
Shaping the future product portfolio of Huawei
Including hardware/software co-design via HiSilicon
28Martin Děcký, FOSDEM, February 3rd
2019 Hardware/Software Co-Design for Efficient Microkernel Execution
Q&A
Thank You!

More Related Content

What's hot

ICCT2017: A user mode implementation of filtering rule management plane using...
ICCT2017: A user mode implementation of filtering rule management plane using...ICCT2017: A user mode implementation of filtering rule management plane using...
ICCT2017: A user mode implementation of filtering rule management plane using...
Ruo Ando
 
Data-Centric Parallel Programming
Data-Centric Parallel ProgrammingData-Centric Parallel Programming
Data-Centric Parallel Programming
inside-BigData.com
 
Fpga based encryption design using vhdl
Fpga based encryption design using vhdlFpga based encryption design using vhdl
Fpga based encryption design using vhdl
eSAT Publishing House
 
An Efficient PDP Scheme for Distributed Cloud Storage
An Efficient PDP Scheme for Distributed Cloud StorageAn Efficient PDP Scheme for Distributed Cloud Storage
An Efficient PDP Scheme for Distributed Cloud Storage
IJMER
 
Multicore Intel Processors Performance Evaluation
Multicore Intel Processors Performance EvaluationMulticore Intel Processors Performance Evaluation
Multicore Intel Processors Performance Evaluation
المهندسة عائشة بني صخر
 
Towards Edge Computing as a Service: Dynamic Formation of the Micro Data-Centers
Towards Edge Computing as a Service: Dynamic Formation of the Micro Data-CentersTowards Edge Computing as a Service: Dynamic Formation of the Micro Data-Centers
Towards Edge Computing as a Service: Dynamic Formation of the Micro Data-Centers
Faculty of Technical Sciences, University of Novi Sad
 
Iaetsd implementation of secure audit process
Iaetsd implementation of secure audit processIaetsd implementation of secure audit process
Iaetsd implementation of secure audit process
Iaetsd Iaetsd
 
DIVISION AND REPLICATION OF DATA IN GRID FOR OPTIMAL PERFORMANCE AND SECURITY
DIVISION AND REPLICATION OF DATA IN GRID FOR OPTIMAL PERFORMANCE AND SECURITYDIVISION AND REPLICATION OF DATA IN GRID FOR OPTIMAL PERFORMANCE AND SECURITY
DIVISION AND REPLICATION OF DATA IN GRID FOR OPTIMAL PERFORMANCE AND SECURITY
ijgca
 
The effect of distributed archetypes on complexity theory
The effect of distributed archetypes on complexity theoryThe effect of distributed archetypes on complexity theory
The effect of distributed archetypes on complexity theory
Vinícius Uchôa
 
A COMPARISON BETWEEN PARALLEL AND SEGMENTATION METHODS USED FOR IMAGE ENCRYPT...
A COMPARISON BETWEEN PARALLEL AND SEGMENTATION METHODS USED FOR IMAGE ENCRYPT...A COMPARISON BETWEEN PARALLEL AND SEGMENTATION METHODS USED FOR IMAGE ENCRYPT...
A COMPARISON BETWEEN PARALLEL AND SEGMENTATION METHODS USED FOR IMAGE ENCRYPT...
ijcsit
 
DEF CON 27 - BRENT STONE - reverse enginerring 17 cars
DEF CON 27 - BRENT STONE - reverse enginerring 17 carsDEF CON 27 - BRENT STONE - reverse enginerring 17 cars
DEF CON 27 - BRENT STONE - reverse enginerring 17 cars
Felipe Prado
 
NEW ALGORITHM FOR WIRELESS NETWORK COMMUNICATION SECURITY
NEW ALGORITHM FOR WIRELESS NETWORK COMMUNICATION SECURITYNEW ALGORITHM FOR WIRELESS NETWORK COMMUNICATION SECURITY
NEW ALGORITHM FOR WIRELESS NETWORK COMMUNICATION SECURITY
ijcisjournal
 
PERFORMANCE EVALUATION OF PARALLEL INTERNATIONAL DATA ENCRYPTION ALGORITHM ON...
PERFORMANCE EVALUATION OF PARALLEL INTERNATIONAL DATA ENCRYPTION ALGORITHM ON...PERFORMANCE EVALUATION OF PARALLEL INTERNATIONAL DATA ENCRYPTION ALGORITHM ON...
PERFORMANCE EVALUATION OF PARALLEL INTERNATIONAL DATA ENCRYPTION ALGORITHM ON...
IJNSA Journal
 
Final Year Project Synopsis: Post Quantum Encryption using Neural Networks
Final Year Project Synopsis: Post Quantum Encryption using Neural NetworksFinal Year Project Synopsis: Post Quantum Encryption using Neural Networks
Final Year Project Synopsis: Post Quantum Encryption using Neural Networks
JPC Hanson
 

What's hot (18)

ICCT2017: A user mode implementation of filtering rule management plane using...
ICCT2017: A user mode implementation of filtering rule management plane using...ICCT2017: A user mode implementation of filtering rule management plane using...
ICCT2017: A user mode implementation of filtering rule management plane using...
 
Data-Centric Parallel Programming
Data-Centric Parallel ProgrammingData-Centric Parallel Programming
Data-Centric Parallel Programming
 
40520130101005
4052013010100540520130101005
40520130101005
 
Fpga based encryption design using vhdl
Fpga based encryption design using vhdlFpga based encryption design using vhdl
Fpga based encryption design using vhdl
 
An Efficient PDP Scheme for Distributed Cloud Storage
An Efficient PDP Scheme for Distributed Cloud StorageAn Efficient PDP Scheme for Distributed Cloud Storage
An Efficient PDP Scheme for Distributed Cloud Storage
 
Multicore Intel Processors Performance Evaluation
Multicore Intel Processors Performance EvaluationMulticore Intel Processors Performance Evaluation
Multicore Intel Processors Performance Evaluation
 
Towards Edge Computing as a Service: Dynamic Formation of the Micro Data-Centers
Towards Edge Computing as a Service: Dynamic Formation of the Micro Data-CentersTowards Edge Computing as a Service: Dynamic Formation of the Micro Data-Centers
Towards Edge Computing as a Service: Dynamic Formation of the Micro Data-Centers
 
Iaetsd implementation of secure audit process
Iaetsd implementation of secure audit processIaetsd implementation of secure audit process
Iaetsd implementation of secure audit process
 
Shilpa ppt
Shilpa pptShilpa ppt
Shilpa ppt
 
DIVISION AND REPLICATION OF DATA IN GRID FOR OPTIMAL PERFORMANCE AND SECURITY
DIVISION AND REPLICATION OF DATA IN GRID FOR OPTIMAL PERFORMANCE AND SECURITYDIVISION AND REPLICATION OF DATA IN GRID FOR OPTIMAL PERFORMANCE AND SECURITY
DIVISION AND REPLICATION OF DATA IN GRID FOR OPTIMAL PERFORMANCE AND SECURITY
 
Lec08 optimizations
Lec08 optimizationsLec08 optimizations
Lec08 optimizations
 
The effect of distributed archetypes on complexity theory
The effect of distributed archetypes on complexity theoryThe effect of distributed archetypes on complexity theory
The effect of distributed archetypes on complexity theory
 
A COMPARISON BETWEEN PARALLEL AND SEGMENTATION METHODS USED FOR IMAGE ENCRYPT...
A COMPARISON BETWEEN PARALLEL AND SEGMENTATION METHODS USED FOR IMAGE ENCRYPT...A COMPARISON BETWEEN PARALLEL AND SEGMENTATION METHODS USED FOR IMAGE ENCRYPT...
A COMPARISON BETWEEN PARALLEL AND SEGMENTATION METHODS USED FOR IMAGE ENCRYPT...
 
DEF CON 27 - BRENT STONE - reverse enginerring 17 cars
DEF CON 27 - BRENT STONE - reverse enginerring 17 carsDEF CON 27 - BRENT STONE - reverse enginerring 17 cars
DEF CON 27 - BRENT STONE - reverse enginerring 17 cars
 
NEW ALGORITHM FOR WIRELESS NETWORK COMMUNICATION SECURITY
NEW ALGORITHM FOR WIRELESS NETWORK COMMUNICATION SECURITYNEW ALGORITHM FOR WIRELESS NETWORK COMMUNICATION SECURITY
NEW ALGORITHM FOR WIRELESS NETWORK COMMUNICATION SECURITY
 
Lec07 threading hw
Lec07 threading hwLec07 threading hw
Lec07 threading hw
 
PERFORMANCE EVALUATION OF PARALLEL INTERNATIONAL DATA ENCRYPTION ALGORITHM ON...
PERFORMANCE EVALUATION OF PARALLEL INTERNATIONAL DATA ENCRYPTION ALGORITHM ON...PERFORMANCE EVALUATION OF PARALLEL INTERNATIONAL DATA ENCRYPTION ALGORITHM ON...
PERFORMANCE EVALUATION OF PARALLEL INTERNATIONAL DATA ENCRYPTION ALGORITHM ON...
 
Final Year Project Synopsis: Post Quantum Encryption using Neural Networks
Final Year Project Synopsis: Post Quantum Encryption using Neural NetworksFinal Year Project Synopsis: Post Quantum Encryption using Neural Networks
Final Year Project Synopsis: Post Quantum Encryption using Neural Networks
 

Similar to Hardware/Software Co-Design for Efficient Microkernel Execution

Synergistic processing in cell's multicore architecture
Synergistic processing in cell's multicore architectureSynergistic processing in cell's multicore architecture
Synergistic processing in cell's multicore architecture
Michael Gschwind
 
A New Direction for Computer Architecture Research
A New Direction for Computer Architecture ResearchA New Direction for Computer Architecture Research
A New Direction for Computer Architecture Research
dbpublications
 
Michael Gschwind et al, "An Open Source Environment for Cell Broadband Engine...
Michael Gschwind et al, "An Open Source Environment for Cell Broadband Engine...Michael Gschwind et al, "An Open Source Environment for Cell Broadband Engine...
Michael Gschwind et al, "An Open Source Environment for Cell Broadband Engine...
Michael Gschwind
 
Evolution of the Windows Kernel Architecture, by Dave Probert
Evolution of the Windows Kernel Architecture, by Dave ProbertEvolution of the Windows Kernel Architecture, by Dave Probert
Evolution of the Windows Kernel Architecture, by Dave Probert
yang
 
Course: "Introductory course to HLS FPGA programming"
Course: "Introductory course to HLS FPGA programming"Course: "Introductory course to HLS FPGA programming"
Course: "Introductory course to HLS FPGA programming"
Mirko Mariotti
 
Lecture_IIITD.pptx
Lecture_IIITD.pptxLecture_IIITD.pptx
Lecture_IIITD.pptx
achakracu
 
Japan's post K Computer
Japan's post K ComputerJapan's post K Computer
Japan's post K Computer
inside-BigData.com
 
Performance of State-of-the-Art Cryptography on ARM-based Microprocessors
Performance of State-of-the-Art Cryptography on ARM-based MicroprocessorsPerformance of State-of-the-Art Cryptography on ARM-based Microprocessors
Performance of State-of-the-Art Cryptography on ARM-based Microprocessors
Hannes Tschofenig
 
Aw4201337340
Aw4201337340Aw4201337340
Aw4201337340
IJERA Editor
 
Architecting Solutions for the Manycore Future
Architecting Solutions for the Manycore FutureArchitecting Solutions for the Manycore Future
Architecting Solutions for the Manycore Future
Talbott Crowell
 
Design of Tele command SOC-IP by AES Cryptographic Method Using VHDL
Design of Tele command SOC-IP by AES Cryptographic Method Using VHDLDesign of Tele command SOC-IP by AES Cryptographic Method Using VHDL
Design of Tele command SOC-IP by AES Cryptographic Method Using VHDL
dbpublications
 
CSC 457 - Advanced Microprocessor Architecture Lecture Notes - 31.08.2021.ppt
CSC 457 - Advanced Microprocessor Architecture Lecture Notes - 31.08.2021.pptCSC 457 - Advanced Microprocessor Architecture Lecture Notes - 31.08.2021.ppt
CSC 457 - Advanced Microprocessor Architecture Lecture Notes - 31.08.2021.ppt
EricSifuna1
 
ZCloud Consensus on Hardware for Distributed Systems
ZCloud Consensus on Hardware for Distributed SystemsZCloud Consensus on Hardware for Distributed Systems
ZCloud Consensus on Hardware for Distributed Systems
Gokhan Boranalp
 
I understand that physics and hardware emmaded on the use of finete .pdf
I understand that physics and hardware emmaded on the use of finete .pdfI understand that physics and hardware emmaded on the use of finete .pdf
I understand that physics and hardware emmaded on the use of finete .pdf
anil0878
 
Systems Support for Many Task Computing
Systems Support for Many Task ComputingSystems Support for Many Task Computing
Systems Support for Many Task Computing
Eric Van Hensbergen
 
onur-comparch-fall2018-lecture3a-whycomparch-afterlecture.pptx
onur-comparch-fall2018-lecture3a-whycomparch-afterlecture.pptxonur-comparch-fall2018-lecture3a-whycomparch-afterlecture.pptx
onur-comparch-fall2018-lecture3a-whycomparch-afterlecture.pptx
sivasubramanianManic2
 
ACCELERATED DEEP LEARNING INFERENCE FROM CONSTRAINED EMBEDDED DEVICES
ACCELERATED DEEP LEARNING INFERENCE FROM CONSTRAINED EMBEDDED DEVICESACCELERATED DEEP LEARNING INFERENCE FROM CONSTRAINED EMBEDDED DEVICES
ACCELERATED DEEP LEARNING INFERENCE FROM CONSTRAINED EMBEDDED DEVICES
IAEME Publication
 

Similar to Hardware/Software Co-Design for Efficient Microkernel Execution (20)

Synergistic processing in cell's multicore architecture
Synergistic processing in cell's multicore architectureSynergistic processing in cell's multicore architecture
Synergistic processing in cell's multicore architecture
 
A New Direction for Computer Architecture Research
A New Direction for Computer Architecture ResearchA New Direction for Computer Architecture Research
A New Direction for Computer Architecture Research
 
Michael Gschwind et al, "An Open Source Environment for Cell Broadband Engine...
Michael Gschwind et al, "An Open Source Environment for Cell Broadband Engine...Michael Gschwind et al, "An Open Source Environment for Cell Broadband Engine...
Michael Gschwind et al, "An Open Source Environment for Cell Broadband Engine...
 
Evolution of the Windows Kernel Architecture, by Dave Probert
Evolution of the Windows Kernel Architecture, by Dave ProbertEvolution of the Windows Kernel Architecture, by Dave Probert
Evolution of the Windows Kernel Architecture, by Dave Probert
 
Oct2009
Oct2009Oct2009
Oct2009
 
Course: "Introductory course to HLS FPGA programming"
Course: "Introductory course to HLS FPGA programming"Course: "Introductory course to HLS FPGA programming"
Course: "Introductory course to HLS FPGA programming"
 
Lecture_IIITD.pptx
Lecture_IIITD.pptxLecture_IIITD.pptx
Lecture_IIITD.pptx
 
Co question 2008
Co question 2008Co question 2008
Co question 2008
 
Japan's post K Computer
Japan's post K ComputerJapan's post K Computer
Japan's post K Computer
 
Performance of State-of-the-Art Cryptography on ARM-based Microprocessors
Performance of State-of-the-Art Cryptography on ARM-based MicroprocessorsPerformance of State-of-the-Art Cryptography on ARM-based Microprocessors
Performance of State-of-the-Art Cryptography on ARM-based Microprocessors
 
Aw4201337340
Aw4201337340Aw4201337340
Aw4201337340
 
Architecting Solutions for the Manycore Future
Architecting Solutions for the Manycore FutureArchitecting Solutions for the Manycore Future
Architecting Solutions for the Manycore Future
 
Design of Tele command SOC-IP by AES Cryptographic Method Using VHDL
Design of Tele command SOC-IP by AES Cryptographic Method Using VHDLDesign of Tele command SOC-IP by AES Cryptographic Method Using VHDL
Design of Tele command SOC-IP by AES Cryptographic Method Using VHDL
 
CSC 457 - Advanced Microprocessor Architecture Lecture Notes - 31.08.2021.ppt
CSC 457 - Advanced Microprocessor Architecture Lecture Notes - 31.08.2021.pptCSC 457 - Advanced Microprocessor Architecture Lecture Notes - 31.08.2021.ppt
CSC 457 - Advanced Microprocessor Architecture Lecture Notes - 31.08.2021.ppt
 
ZCloud Consensus on Hardware for Distributed Systems
ZCloud Consensus on Hardware for Distributed SystemsZCloud Consensus on Hardware for Distributed Systems
ZCloud Consensus on Hardware for Distributed Systems
 
I understand that physics and hardware emmaded on the use of finete .pdf
I understand that physics and hardware emmaded on the use of finete .pdfI understand that physics and hardware emmaded on the use of finete .pdf
I understand that physics and hardware emmaded on the use of finete .pdf
 
Systems Support for Many Task Computing
Systems Support for Many Task ComputingSystems Support for Many Task Computing
Systems Support for Many Task Computing
 
onur-comparch-fall2018-lecture3a-whycomparch-afterlecture.pptx
onur-comparch-fall2018-lecture3a-whycomparch-afterlecture.pptxonur-comparch-fall2018-lecture3a-whycomparch-afterlecture.pptx
onur-comparch-fall2018-lecture3a-whycomparch-afterlecture.pptx
 
2337610
23376102337610
2337610
 
ACCELERATED DEEP LEARNING INFERENCE FROM CONSTRAINED EMBEDDED DEVICES
ACCELERATED DEEP LEARNING INFERENCE FROM CONSTRAINED EMBEDDED DEVICESACCELERATED DEEP LEARNING INFERENCE FROM CONSTRAINED EMBEDDED DEVICES
ACCELERATED DEEP LEARNING INFERENCE FROM CONSTRAINED EMBEDDED DEVICES
 

More from Martin Děcký

Code Instrumentation, Dynamic Tracing
Code Instrumentation, Dynamic TracingCode Instrumentation, Dynamic Tracing
Code Instrumentation, Dynamic Tracing
Martin Děcký
 
Nízkoúrovňové programování
Nízkoúrovňové programováníNízkoúrovňové programování
Nízkoúrovňové programování
Martin Děcký
 
Porting HelenOS to RISC-V
Porting HelenOS to RISC-VPorting HelenOS to RISC-V
Porting HelenOS to RISC-V
Martin Děcký
 
What Could Microkernels Learn from Monolithic Kernels (and Vice Versa)
What Could Microkernels Learn from Monolithic Kernels (and Vice Versa)What Could Microkernels Learn from Monolithic Kernels (and Vice Versa)
What Could Microkernels Learn from Monolithic Kernels (and Vice Versa)
Martin Děcký
 
FOSDEM 2014: Read-Copy-Update for HelenOS
FOSDEM 2014: Read-Copy-Update for HelenOSFOSDEM 2014: Read-Copy-Update for HelenOS
FOSDEM 2014: Read-Copy-Update for HelenOS
Martin Děcký
 
FOSDEM 2013: Operating Systems Hot Topics
FOSDEM 2013: Operating Systems Hot TopicsFOSDEM 2013: Operating Systems Hot Topics
FOSDEM 2013: Operating Systems Hot Topics
Martin Děcký
 
HelenOS: State of the Union 2012
HelenOS: State of the Union 2012HelenOS: State of the Union 2012
HelenOS: State of the Union 2012
Martin Děcký
 

More from Martin Děcký (7)

Code Instrumentation, Dynamic Tracing
Code Instrumentation, Dynamic TracingCode Instrumentation, Dynamic Tracing
Code Instrumentation, Dynamic Tracing
 
Nízkoúrovňové programování
Nízkoúrovňové programováníNízkoúrovňové programování
Nízkoúrovňové programování
 
Porting HelenOS to RISC-V
Porting HelenOS to RISC-VPorting HelenOS to RISC-V
Porting HelenOS to RISC-V
 
What Could Microkernels Learn from Monolithic Kernels (and Vice Versa)
What Could Microkernels Learn from Monolithic Kernels (and Vice Versa)What Could Microkernels Learn from Monolithic Kernels (and Vice Versa)
What Could Microkernels Learn from Monolithic Kernels (and Vice Versa)
 
FOSDEM 2014: Read-Copy-Update for HelenOS
FOSDEM 2014: Read-Copy-Update for HelenOSFOSDEM 2014: Read-Copy-Update for HelenOS
FOSDEM 2014: Read-Copy-Update for HelenOS
 
FOSDEM 2013: Operating Systems Hot Topics
FOSDEM 2013: Operating Systems Hot TopicsFOSDEM 2013: Operating Systems Hot Topics
FOSDEM 2013: Operating Systems Hot Topics
 
HelenOS: State of the Union 2012
HelenOS: State of the Union 2012HelenOS: State of the Union 2012
HelenOS: State of the Union 2012
 

Recently uploaded

Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
g2nightmarescribd
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 

Recently uploaded (20)

Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 

Hardware/Software Co-Design for Efficient Microkernel Execution

  • 1. Hardware/Software Co-Design for Efficient Microkernel Execution Martin Děcký martin.decky@huawei.com February 2019
  • 2. Martin Děcký, FOSDEM, February 3rd 2019 Hardware/Software Co-Design for Efficient Microkernel Execution 2 Who Am I Passionate programmer and operating systems enthusiast With a specific inclination towards multiserver microkernels HelenOS developer since 2004 Research Scientist from 2006 to 2018 Charles University (Prague), Distributed Systems Research Group Senior Research Engineer since 2017 Huawei Technologies (Munich), German Research Center, Central Software Institute, OS Kernel Lab
  • 3. 3Martin Děcký, FOSDEM, February 3rd 2019 Hardware/Software Co-Design for Efficient Microkernel Execution Microkernel Multiserver Systems are better than Monolithic Systems 3
  • 4. Martin Děcký, FOSDEM, February 3rd 2019 Hardware/Software Co-Design for Efficient Microkernel Execution 4 Monolithic OS Design is Flawed Biggs S., Lee D., Heiser G.: The Jury Is In: Monolithic OS Design Is Flawed: Microkernel-based Designs Improve Security, ACM 9th Asia- Pacific Workshop on Systems (APSys), 2018 “While intuitive, the benefits of the small TCB have not been quantified to date. We address this by a study of critical Linux CVEs, where we examine whether they would be prevented or mitigated by a microkernel-based design. We find that almost all exploits are at least mitigated to less than critical severity, and 40 % completely eliminated by an OS design based on a verified microkernel, such as seL4.”
  • 5. 5Martin Děcký, FOSDEM, February 3rd 2019 Hardware/Software Co-Design for Efficient Microkernel Execution Problem Statement5
  • 6. Martin Děcký, FOSDEM, February 3rd 2019 Hardware/Software Co-Design for Efficient Microkernel Execution 6 Problem Statement Microkernel design ideas go as back as 1969 RC 4000 Multiprogramming System nucleus (Per Brinch Hansen) Isolation of unprivileged processes, inter-process communication, hierarchical control Even after 50 years they are not fully accepted as mainstream Hardware and software used to be designed independently Designing CPUs used to be an extremely complicated and costly process Operating systems used to be written after the CPUs were designed Hardware designs used to be rather conservative
  • 7. Martin Děcký, FOSDEM, February 3rd 2019 Hardware/Software Co-Design for Efficient Microkernel Execution 7 Problem Statement (2) Mainstream ISAs used to be designed in a rather conservative way Can you name some really revolutionary ISA features since IBM System/370 Advanced Function? Requirements on the new ISAs usually follow the needs of the mainstream operating systems running on the past ISAs No wonder microkernels suffer performance penalties compared to monolithic systems The more fine-grained the architecture, the more penalties it suffers Let us design the hardware with microkernels in mind!
  • 8. Martin Děcký, FOSDEM, February 3rd 2019 Hardware/Software Co-Design for Efficient Microkernel Execution 8 The Vicious Cycle CPUs do not support microkernels properly
  • 9. Martin Děcký, FOSDEM, February 3rd 2019 Hardware/Software Co-Design for Efficient Microkernel Execution 9 The Vicious Cycle CPUs do not support microkernels properly Microkernels suffer perfromance penalties
  • 10. Martin Děcký, FOSDEM, February 3rd 2019 Hardware/Software Co-Design for Efficient Microkernel Execution 10 The Vicious Cycle CPUs do not support microkernels properly Microkernels are not in the mainstream Microkernels suffer perfromance penalties
  • 11. Martin Děcký, FOSDEM, February 3rd 2019 Hardware/Software Co-Design for Efficient Microkernel Execution 11 The Vicious Cycle CPUs do not support microkernels properly Microkernels are not in the mainstream Microkernels suffer perfromance penalties No requirements on CPUs from microkernels
  • 12. Martin Děcký, FOSDEM, February 3rd 2019 Hardware/Software Co-Design for Efficient Microkernel Execution 12 The Vicious Cycle CPUs do not support microkernels properly Microkernels are not in the mainstream Microkernels suffer perfromance penalties No requirements on CPUs from microkernels
  • 13. 13Martin Děcký, FOSDEM, February 3rd 2019 Hardware/Software Co-Design for Efficient Microkernel Execution Any Ideas?
  • 14. Martin Děcký, FOSDEM, February 3rd 2019 Hardware/Software Co-Design for Efficient Microkernel Execution 14 Communication between Address Spaces Control and data flow between subsystems Monolithic kernel Function calls Passing arguments in registers and on the stack Passing direct pointers to memory structures Multiserver microkernel IPC via microkernel syscalls Passing arguments in a subset of registers Privilege level switch, address space switch Scheduling (in case of asynchronous IPC) Data copying or memory sharing with page granularity
  • 15. Martin Děcký, FOSDEM, February 3rd 2019 Hardware/Software Co-Design for Efficient Microkernel Execution 15 Communication between Address Spaces (2) Is the kernel round-trip of the IPC necessary? Suggestion for synchronous IPC: Extended Jump/Call and Return instructions that also switch the address space Communicating parties identified by a “call gate” (capability) containing the target address space and the PC of the IPC handler (implicit for return) Call gates stored in a TLB-like hardware cache (CLB) CLB populated by the microkernel similarly to TLB-only memory management architecture Suggestion for asynchronous IPC: Using CPU cache lines as the buffers for the messages Async Jump/Call, Async Return and Async Receive instructions Using the CPU cache like an extended register stack engine
  • 16. Martin Děcký, FOSDEM, February 3rd 2019 Hardware/Software Co-Design for Efficient Microkernel Execution 16 Communication between Address Spaces (3) Bulk data Observation: Memory sharing is actually quite efficient for large amounts of data (multiple pages) Overhead is caused primarily by creating and tearing down the shared pages Data needs to be page-aligned Sub-page granularity and dynamic data structures Suggestion: Using CPU cache lines as shared buffers Much finer granularity than pages (typically 64 to 128 bytes) A separate virtual-to-cache mapping mechanism before the standard virtual-to-physical mapping
  • 17. Martin Děcký, FOSDEM, February 3rd 2019 Hardware/Software Co-Design for Efficient Microkernel Execution 17 Fast Context Switching Current microsecond-scale latency hiding mechanisms Hardware multi-threading Effective Does not scale beyond a few threads Operating system context switching Scales for any thread count Too slow (order of 10 µs) Goal: Finding a sweet spot between the two mechanisms
  • 18. Martin Děcký, FOSDEM, February 3rd 2019 Hardware/Software Co-Design for Efficient Microkernel Execution 18 Fast Context Switching (2) Suggestion: Hardware cache for contexts Again, similar mechanism to TLB-only memory management Dedicated instructions for context store, context restore, context switch, context save, context load Context data could be potentially ABI-optimized Autonomous mechanism for event-triggered context switch (e.g. external interrupt) Efficient hardware mechanism for latency hiding The equivalent of fine/coarse-grained simultaneous multithreading The software scheduler is in charge of setting the scheduler policy The CPU is in charge of scheduling the contexts based on ALU, cache and other resource availability
  • 19. Martin Děcký, FOSDEM, February 3rd 2019 Hardware/Software Co-Design for Efficient Microkernel Execution 19 User Space Interrupt Processing Extension of the fast context switching mechanism Efficient delivery of interrupt events to user space device drivers Without the routine microkernel intervention An interrupt could be directly handled by a preconfigured hardware context in user space A clear path towards moving even the timer interrupt handler and the scheduler from kernel space to user space Going back to interrupt-driven handling of peripherals with extreme low latency requirements (instead of polling) The usual pain point: Level-triggered interrupts Some coordination with the platform interrupt controller is probably needed to automatically mask the interrupt source
  • 20. Martin Děcký, FOSDEM, February 3rd 2019 Hardware/Software Co-Design for Efficient Microkernel Execution 20 Capabilities as First-Class Entities Capabilities as unforgeable object identifiers But eventually each access to an object needs to be bound-checked and translated into the (flat) virtual address space Suggestion: Embedding the capability reference in pointers RV128 (128-bit variant of RISC-V) would provide 64 bits for the capability reference and 64 bits for object offset 128-bit flat pointers are probably useless anyway Besides the (somewhat narrow) use in the microkernel, this could be useful for other purposes Simplifying the implementation of managed languages’ VMs Working with multiple virtual address spaces at once
  • 21. Martin Děcký, FOSDEM, February 3rd 2019 Hardware/Software Co-Design for Efficient Microkernel Execution 21 Prior Art Nordström S., Lindh L., Johansson L., Skoglund T.: Application Specific Real-Time Microkernel in Hardware, 14th IEEE-NPSS Real Time Conference, 2005 Offloading basic microkernel operations (e.g. thread creation, context switching) to hardware shown to improve performance by 15 % on average and up to 73 % This was a coarse-grained approach Hardware message passing in Intel SCC and Tilera TILE-G64/TILE- Pro64 Asynchronous message passing with tight software integration
  • 22. Martin Děcký, FOSDEM, February 3rd 2019 Hardware/Software Co-Design for Efficient Microkernel Execution 22 Prior Art (2) Hajj I. E,, Merritt A., Zellweger G., Milojicic D., Achermann R., Faraboschi P., Hwu W., Roscoe T., Schwan K.: SpaceJMP: Programming with Multiple Virtual Address Spaces, 21st ACM ASPLOS, 2016 Practical programming model for using multiple virtual address spaces on commodity hardware (evaluated on DragonFly BSD and Barrelfish) Useful for data-centric applications for sharing large amounts of memory between processes Intel IA-32 Task State Segment (TSS) Hardware-based context switching Historically, it has been used by Linux The primary reason for removal was not performance, but portability
  • 23. Martin Děcký, FOSDEM, February 3rd 2019 Hardware/Software Co-Design for Efficient Microkernel Execution 23 Prior Art (3) Intel VT-x VM Functions (VMFUNC) Efficient cross-VM function calls Switching the EPT and passing register arguments Current implementation limited to 512 entry points Practically usable even for very fine-grained virtualization with the granularity of individual functions Liu Y., Zhou T., Chen K., Chen H., Xia Y.: Thwarting Memory Disclosure with Efficient Hypervisor-enforced Intra-domain Isolation, 22nd ACM SIGSAC Conference on Computer and Communications Security, 2015 – “The cost of a VMFUNC is similar with a syscall” – “… hypervisor-level protection at the cost of system calls” SkyBridge paper to appear at EuroSys 2019
  • 24. Martin Děcký, FOSDEM, February 3rd 2019 Hardware/Software Co-Design for Efficient Microkernel Execution 24 Prior Art (4) Woodruff J., Watson R. N. M., Chisnall D., Moore S., Anderson J., Davis B., Laurie B., Neumann P. G., Norton R., Roe. M.: The CHERI capability model: Revisiting RISC in the an age of risk, 41st ACM Annual International Symposium on Computer Architecture, 2014 Hardware-based capability model for byte-granularity memory protection Extension of the 64-bit MIPS ISA Evaluated on an extended MIPS R4000 FPGA soft-core 32 capability registers (256 bits) Limitation: Inflexible design mostly due to the tight backward compatibility with a 64-bit ISA Intel MPX Several design and implementation issues, deemed not production-ready
  • 25. Martin Děcký, FOSDEM, February 3rd 2019 Hardware/Software Co-Design for Efficient Microkernel Execution 25 Summary Traditionally, hardware has not been designed to accommodate the requirements of microkernel multiserver operating systems Microkernels thus suffer performance penalties This prevented them from replacing monolithic operating systems and closed the vicious cycle Hardware design is hopefully becoming more accessible and democratic E.g. RISC-V Co-designing the hardware and software might help us gain the benefits of the microkernel multiserver design with no performance penalties However, it requires some out-of-the-box thinking
  • 26. Martin Děcký, FOSDEM, February 3rd 2019 Hardware/Software Co-Design for Efficient Microkernel Execution 26 Acknowledgements OS Kernel Lab at Huawei Technologies Javier Picorel Haibo Chen
  • 27. Martin Děcký, FOSDEM, February 3rd 2019 Hardware/Software Co-Design for Efficient Microkernel Execution 27 Huawei Dresden R&D Lab Focusing on microkernel research, design and development Basic research Applied research Prototype development Collaboration with academia and other technology companies Looking for senior operating system researchers, designers, developers and experts Previous microkernel experience is a big plus “A startup within a large company” Shaping the future product portfolio of Huawei Including hardware/software co-design via HiSilicon
  • 28. 28Martin Děcký, FOSDEM, February 3rd 2019 Hardware/Software Co-Design for Efficient Microkernel Execution Q&A