Presentation processor with integrated real time garbage collection

A Novel RISC Processor Architecture For
Garbage Collection in Embedded Systems

© TLB GmbH, Karlsruhe 2012

Buffer Overflows are Responsible for a
Large Number of Today’s Security and Safety Problems

 In a standard computer system, dynamically growing data
structures can overwrite unrelated data (buffer overflow)
 The standard processor architecture lacks protection mechanisms
against buffer overflows
 Buffer overflow errors are a common cause for critical security
vulnerabilities


Garbage Collection Helps Reduce Buffer Overflows,
But Causes High Overhead & Unpredictable Pauses

 Automatic dynamic memory management automatically releases and
compacts dynamically allocated memory after its last use
 Such “garbage collection” reduces common error sources
for buffer overflows
 Existing garbage collection is mostly software-based, demands a high
overhead and causes unpredictable pauses in the program execution
 The limited resources of embedded systems typically do not allow
for efficient garbage collection in real time


A Novel Approach Enables Parallel Garbage Collection
And Parallel Synchronization in Real-Time

 The novel RISC processor architecture is optimized for security:
 Strict separation of pointers from ordinary non-pointer data by using
distinct register sets for pointers and data
 The dedicated coprocessor performs the garbage collection:
 The coprocessor uses an optimized Baker-style copying collector
algorithm that runs in parallel to the main processor
 A new garbage collection cycle is started by the coprocessor when the
available memory falls below a chosen threshold
 Simple hardware extensions to the processor pipeline support the
synchronization between garbage collector and main processor
 Key for the efficient implementation to avoid unbounded pauses


This Novel Approach Improves Performance
By Leaving The Cache Largely Unaffected

 Software garbage collectors usually repeatedly
displace the entire contents of the cache
 Examine the entire heap during a single cycle
 The coprocessor directly connects to the memory controller
 Does not access memory through the main processor’s cache
 The cache remains largely unaffected by the garbage collection
 The coprocessor ensures cache coherency
 Inspects and selectively flushes single cache lines through a dedicated
cache port (resembles snoop port)
 The coprocessor eliminates unnecessary memory traffic
 Invalidates all cache lines with dead objects
at the end of a garbage collection cycle


A Fully Functional Prototype Exists
And Has Been Used For Performance Measurements

 Main processor & GC Coprocessor modeled at register transfer level
in VHDL, synthesized for Altera APEX 20K1000C (@ 25MHz)
 Pipelined RISC processor, statically scheduled
 up to 3 instructions per clock cycle (3-way multiple issue, “in order”)
 16 pointer registers, 16 data registers, 8 Praedikatregister
 8K execution cache, 8K data cache, 2K attribute cache
 two-way set-associative copy-back cache
 Micro-coded garbage collection coprocessor
 256 x 80 bit on-chip microcode memory
 Uses less than 20% of the chip surface area
 Software
 Native Java bytecode compiler developed for the architecture.
An included code scheduler rearranges instructions to take advantage of
the processor’s parallel execution units and to hide instruction latencies
 Subset of the Java class libraries supporting text-based apps in order to
facilitate the execution of representative programs (includes NFS client)


An Experimental Computer System Was Assembled
Based On The Garbage-Collection Processor


Pauses Caused By Garbage Collection Do
Not Exceed 500 Clock Cycles

Frequency distribution of synchronization pauses (shown for javac)

Frequency

Pause Duration in Clock Cycles


The Runtime Overhead For The Hardware-Based
Garbage Collection Is Small


The Advantages Of This Approach Could Enable
Real-Time Garbage Collection in Embedded Systems

 Limits pauses from garbage collection to 500 clock cycles
 Efficient synchronization
 No code overhead
 Low total runtime overhead of only a few percent
 Undisturbed cache locality
 Exact (non-conservative) garbage collection
 Compiler & code are independent from garbage collector


BACKUP


Efficient Implementation


Coprocessor Architecture


Synchronization I


Synchronization II


Synchronization III


Synchronization IV


The Runtime Overhead Is Small - I


The Runtime Overhead Is Small - II


The Runtime Overhead Is Small - III


Presentation processor with integrated real time garbage collection

More Related Content

Similar to Presentation processor with integrated real time garbage collection

More from Dr. Andrea Nestl

Recently uploaded

Presentation processor with integrated real time garbage collection