This document summarizes several publications related to hardware contributions to academia involving efficient storage of tag bits in DRAM, efficient revocation of pointers for temporal safety using capability tags, and the CHERI ISA reference manual. It describes work on tagged memory, CHERIvoke for temporal safety enforcement, and the CHERI architecture including formal models, simulators, and test frameworks.
Digital Security by Design: CHERI-RISC-V - Simon Moore, University of Cambridge
1. Example hardware contributions to academia (2/3)
• Published in IEEE ICCD conference, 2017
• Efficient storage of tag bits in commodity DRAM
Efficient Tagged Memory
Alexandre Joannou⇤, Jonathan Woodruff⇤, Robert Kovacsics⇤, Simon W. Moore⇤, Alex Bradbury⇤, Hongyan Xia⇤,
Robert N. M. Watson⇤, David Chisnall⇤, Michael Roe⇤, Brooks Davis†, Edward Napierala⇤,
John Baldwin†, Khilan Gudka⇤, Peter G. Neumann†, Alfredo Mazzinghi⇤,
Alex Richardson⇤, Stacey Son†, A. Theodore Markettos⇤
⇤Computer Laboratory, University of Cambridge, Cambridge, UK †SRI International, Menlo Park, CA, USA
Website: www.cl.cam.ac.uk/research/comparch Website: www.sri.com
Abstract—We characterize the cache behavior of an in-memory
tag table and demonstrate that an optimized implementation
can typically achieve a near-zero memory traffic overhead. Both
industry and academia have repeatedly demonstrated tagged
memory as a key mechanism to enable enforcement of power-
ful security invariants, including capabilities, pointer integrity,
watchpoints, and information-flow tracking. A single-bit tag
shadowspace is the most commonly proposed requirement, as
one bit is the minimum metadata needed to distinguish between
an untyped data word and any number of new hardware-
enforced types. We survey various tag shadowspace approaches
and identify their common requirements and positive features of
their implementations. To avoid non-standard memory widths,
we identify the most practical implementation for tag storage to
be an in-memory table managed next to the DRAM controller.
We characterize the caching performance of such a tag table
and demonstrate a DRAM traffic overhead below 5% for the
vast majority of applications. We identify spatial locality on a
page scale as the primary factor that enables surprisingly high
table cache-ability. We then demonstrate tag-table compression
for a set of common applications. A hierarchical structure with
elegantly simple optimizations reduces DRAM traffic overhead to
below 1% for most applications. These insights and optimizations
pave the way for commercial applications making use of single-bit
tags stored in commodity memory.
I. INTRODUCTION
Hardware support for tagged memory has been implemented
from early days of computer architecture [1], [2], and tagged
patterns sufficiently to inform implementations or further
optimizations.
For simplicity, we identify three points in the tagging design
space: no tag, a single-bit tag (SBT), or a multi-bit tag (MBT)
per word. This paper demonstrates that SBT systems can be
nearly as efficient as untagged memory. We do not attempt
to optimize MBTs, although some of the principles here will
also apply to small MBT systems.
The contributions of this paper include:
• A survey of proposed implementations of SBT systems
identifying a practical approach: an in-DRAM tag table
with a tag cache next to the DRAM controller, including
tags with metadata in data caches.
• A characterization of the dynamic workload of tag-
table caches whose hit rates can be surprisingly high,
considering that we are below the last-level cache so most
temporal and spatial locality has already been exploited.
We sweep parameter spaces and evaluate against a range
of benchmarks with diverse characteristics.
• A characterization of an elegantly simple and highly
effective compression scheme for three tag use cases,
finding that it reduces overhead for tag-memory traffic
to nearly zero for most applications.
Benchmarks run on our FPGA implementation confirm the
simulation results, and demonstrate that an SBT memory
2. Example hardware contributions to academia (3/3)
• Published in IEEE Micro conference, October 2019
• Efficient revocation of pointers for temporal safety
• Eliminate use-after-reallocation bugs
• Capability tags makes finding pointers fast and precise
• Performance overheads under 5% (for 25% heap size increase)
CHERIvoke: Characterizing Pointer Revocation using
CHERI Capabilities for Temporal Memory Safety
Hongyan Xia*1, Jonathan Woodruff*1, Sam Ainsworth*1, Nathaniel W. Filardo1,
Peter G. Neumann2, Simon W. Moore1, Robert N. M. Watson1, and Timothy M. Jones1
1University of Cambridge, 2SRI International, {firstname.lastname}@cl.cam.ac.uk
ABSTRACT
A lack of temporal safety in low-level languages has led to
an epidemic of use-after-free exploits. These have surpassed
in number and severity even the infamous buffer-overflow
exploits violating spatial safety. Capability addressing can di-
rectly enforce spatial safety for the C language by enforcing
bounds on pointers and by rendering pointers unforgeable.
Nevertheless, an efficient solution for strong temporal mem-
ory safety remains elusive.
CHERI is an architectural extension to provide hardware
capability addressing that is seeing significant commercial
and open-source interest. We show that CHERI capabilities
can be used as a foundation to enable low-cost heap tempo-
ral safety by facilitating out-of-date pointer revocation, as
capabilities enable precise and efficient identification and in-
validation of pointers, even when using unsafe languages such
as C. We develop CHERIvoke, a technique for deterministic
and fast sweeping revocation to enforce temporal safety on
CHERI systems. CHERIvoke quarantines freed data before
periodically using a small shadow map to revoke all dangling
pointers in a single sweep of memory, and provides a tunable
tradeoff between performance and heap growth. We evalu-
ate the performance of such a system for high-performance
processors, and further analytically examine its primary over-
heads. When configured with a heap-size overhead of 25%,
we find that CHERIvoke achieves an average execution-time
overhead of under 5%, far below the overheads associated
with traditional garbage collection, revocation, or page-table
systems.
Spatial safety can be guaranteed at low cost by using
architectural extensions, such as CHERI [38], a hardware
capability architecture that is influencing the direction of
industry [18]. CHERI replaces pointers with unforgeable,
architecturally identifiable references (capabilities) that con-
vey not only the current address, but also the full range that
is legally accessible through that reference. In this paper,
we show that CHERI further allows us to achieve low-cost
temporal safety for the heap in low-level languages, such
as C and C++, with only minor architectural changes. By
comparison, legacy architectures do not allow fine-grained
temporal safety of untrusted programs, as they can neither
eliminate the possibility of retaining or fabricating references
to freed memory nor uniquely distinguish dangling pointers
from innocuous data.
We have designed CHERIvoke, a technique providing tem-
poral safety for memory allocators on top of hardware ca-
pabilities, complete with minor architectural extensions for
performance. CHERIvoke delays reallocation of memory,
holding manually-freed objects in a quarantine buffer un-
til performing a sweep of memory to remove all references
to these objects. An implementation of CHERIvoke thus
prevents reallocation of memory that may still be address-
able by references available to the program. Furthermore,
CHERIvoke specifies a data structure for describing the freed
memory locations using a shadow map wherein one bit repre-
sents a 16-byte allocation granule. This shadow map enables
a single sweep to revoke access to an arbitrary number of
memory locations, regardless of heap layout.
CHERIvoke provides fast, uncircumventable temporal guar-
3. CHERI ISA Refence Manual
• Architecture-neutral CHERI model
• Elaborated CHERI-RISC-V ISA
• CHERI Concentrate capability compression
model (IEEETC 2019)
• Side-channel resistance features
• Improved C-language compatibility, dynamic
linkage, performance optimizations (ASPLOS
2019)
• Experimental features including 64-bit
capabilities for 32-bit architectures (ICCD
2018), temporal safety
(IEEE Micro 2019)
• All instruction pseudocode derived from Sail
formal models
39
4. CHERI-RISC-V: models, simulators & test
• Formal model of CHERI-RISC-V in Sail (more later)
• Simulators, etc., can be produced
• Spike ISA simulator (main RISC-V simulator)
• TestRIG
• Tests processor-under-test with a model (e.g. Sail/Spike)
• Directed random test generation system
• Directly injects test sequences into the processor
• Performs test case shrinkage: identifies your bugs fast
40
5. TestRIG in action
41
compare generate
VEngine
network socket network socket
RVFI-DII
standard
design under test
(Piccolo…)
golden model
(Sail-rv, Spike, RVBS)
Haskell,
OCaml, …
BSV, Sail,
C++, …
6. RVFI-DII Pipeline Instrumentation
• Cost of automated testing with minimization: Implementation
instrumentation
• Appropriate for the age of FPGAs and open-source hardware
42
Direct Instruction
Injection (DII)
RVFI Execution
Trace
IF/ID ID/EX EX/MEM MEM/WB
General
Purpose
Registers
ALU
Data
MemoryInstruction
Memory
PC
Defined memory layout
Instruction
memory bypass
7. Summary
• CHERI-RISC-V
• ISA models and simulators available now
• Implementations available
• and more being worked on
• Toolchains available: Clang/LLVM compiler (now),
CHERI-FreeRTOS (now), CheriBSD (Q2 2020), etc.
• Great for academic hardware work
• Please use our stuff and collaborate!
• More on: cheri-cpu.org
43