A Novel RISC Processor Architecture For
Garbage Collection in Embedded Systems




                © TLB GmbH, Karlsruhe 2012
Buffer Overflows are Responsible for a
Large Number of Today’s Security and Safety Problems




   In a standard computer system, dynamically growing data
    structures can overwrite unrelated data (buffer overflow)
   The standard processor architecture lacks protection mechanisms
    against buffer overflows
   Buffer overflow errors are a common cause for critical security
    vulnerabilities




                             © TLB GmbH, Karlsruhe 2012
Garbage Collection Helps Reduce Buffer Overflows,
But Causes High Overhead & Unpredictable Pauses




   Automatic dynamic memory management automatically releases and
    compacts dynamically allocated memory after its last use
   Such “garbage collection” reduces common error sources
    for buffer overflows
   Existing garbage collection is mostly software-based, demands a high
    overhead and causes unpredictable pauses in the program execution
   The limited resources of embedded systems typically do not allow
    for efficient garbage collection in real time

                              © TLB GmbH, Karlsruhe 2012
A Novel Approach Enables Parallel Garbage Collection
And Parallel Synchronization in Real-Time




   The novel RISC processor architecture is optimized for security:
      Strict separation of pointers from ordinary non-pointer data by using
       distinct register sets for pointers and data
   The dedicated coprocessor performs the garbage collection:
      The coprocessor uses an optimized Baker-style copying collector
       algorithm that runs in parallel to the main processor
      A new garbage collection cycle is started by the coprocessor when the
       available memory falls below a chosen threshold
   Simple hardware extensions to the processor pipeline support the
    synchronization between garbage collector and main processor
      Key for the efficient implementation to avoid unbounded pauses

                               © TLB GmbH, Karlsruhe 2012
This Novel Approach Improves Performance
By Leaving The Cache Largely Unaffected


   Software garbage collectors usually repeatedly
    displace the entire contents of the cache
      Examine the entire heap during a single cycle
   The coprocessor directly connects to the memory controller
      Does not access memory through the main processor’s cache
      The cache remains largely unaffected by the garbage collection
   The coprocessor ensures cache coherency
      Inspects and selectively flushes single cache lines through a dedicated
       cache port (resembles snoop port)
   The coprocessor eliminates unnecessary memory traffic
      Invalidates all cache lines with dead objects
       at the end of a garbage collection cycle




                                 © TLB GmbH, Karlsruhe 2012
A Fully Functional Prototype Exists
And Has Been Used For Performance Measurements

   Main processor & GC Coprocessor modeled at register transfer level
    in VHDL, synthesized for Altera APEX 20K1000C (@ 25MHz)
   Pipelined RISC processor, statically scheduled
      up to 3 instructions per clock cycle (3-way multiple issue, “in order”)
      16 pointer registers, 16 data registers, 8 Praedikatregister
      8K execution cache, 8K data cache, 2K attribute cache
      two-way set-associative copy-back cache
   Micro-coded garbage collection coprocessor
      256 x 80 bit on-chip microcode memory
      Uses less than 20% of the chip surface area
   Software
      Native Java bytecode compiler developed for the architecture.
        An included code scheduler rearranges instructions to take advantage of
        the processor’s parallel execution units and to hide instruction latencies
      Subset of the Java class libraries supporting text-based apps in order to
        facilitate the execution of representative programs (includes NFS client)

                                  © TLB GmbH, Karlsruhe 2012
An Experimental Computer System Was Assembled
Based On The Garbage-Collection Processor




                    © TLB GmbH, Karlsruhe 2012
Pauses Caused By Garbage Collection Do
Not Exceed 500 Clock Cycles


  Frequency distribution of synchronization pauses (shown for javac)




          Frequency




                      Pause Duration in Clock Cycles


                             © TLB GmbH, Karlsruhe 2012
The Runtime Overhead For The Hardware-Based
Garbage Collection Is Small




                    © TLB GmbH, Karlsruhe 2012
The Advantages Of This Approach Could Enable
Real-Time Garbage Collection in Embedded Systems



   Limits pauses from garbage collection to 500 clock cycles
   Efficient synchronization
      No code overhead
      Low total runtime overhead of only a few percent
   Undisturbed cache locality
   Exact (non-conservative) garbage collection
   Compiler & code are independent from garbage collector




                          © TLB GmbH, Karlsruhe 2012
BACKUP




         © TLB GmbH, Karlsruhe 2012
Efficient Implementation




                   © TLB GmbH, Karlsruhe 2012
Coprocessor Architecture




                  © TLB GmbH, Karlsruhe 2012
Synchronization I




                    © TLB GmbH, Karlsruhe 2012
Synchronization II




                     © TLB GmbH, Karlsruhe 2012
Synchronization III




                      © TLB GmbH, Karlsruhe 2012
Synchronization IV




                     © TLB GmbH, Karlsruhe 2012
The Runtime Overhead Is Small - I




                   © TLB GmbH, Karlsruhe 2012
The Runtime Overhead Is Small - II




                   © TLB GmbH, Karlsruhe 2012
The Runtime Overhead Is Small - III




                   © TLB GmbH, Karlsruhe 2012

Presentation processor with integrated real time garbage collection

  • 1.
    A Novel RISCProcessor Architecture For Garbage Collection in Embedded Systems © TLB GmbH, Karlsruhe 2012
  • 2.
    Buffer Overflows areResponsible for a Large Number of Today’s Security and Safety Problems  In a standard computer system, dynamically growing data structures can overwrite unrelated data (buffer overflow)  The standard processor architecture lacks protection mechanisms against buffer overflows  Buffer overflow errors are a common cause for critical security vulnerabilities © TLB GmbH, Karlsruhe 2012
  • 3.
    Garbage Collection HelpsReduce Buffer Overflows, But Causes High Overhead & Unpredictable Pauses  Automatic dynamic memory management automatically releases and compacts dynamically allocated memory after its last use  Such “garbage collection” reduces common error sources for buffer overflows  Existing garbage collection is mostly software-based, demands a high overhead and causes unpredictable pauses in the program execution  The limited resources of embedded systems typically do not allow for efficient garbage collection in real time © TLB GmbH, Karlsruhe 2012
  • 4.
    A Novel ApproachEnables Parallel Garbage Collection And Parallel Synchronization in Real-Time  The novel RISC processor architecture is optimized for security:  Strict separation of pointers from ordinary non-pointer data by using distinct register sets for pointers and data  The dedicated coprocessor performs the garbage collection:  The coprocessor uses an optimized Baker-style copying collector algorithm that runs in parallel to the main processor  A new garbage collection cycle is started by the coprocessor when the available memory falls below a chosen threshold  Simple hardware extensions to the processor pipeline support the synchronization between garbage collector and main processor  Key for the efficient implementation to avoid unbounded pauses © TLB GmbH, Karlsruhe 2012
  • 5.
    This Novel ApproachImproves Performance By Leaving The Cache Largely Unaffected  Software garbage collectors usually repeatedly displace the entire contents of the cache  Examine the entire heap during a single cycle  The coprocessor directly connects to the memory controller  Does not access memory through the main processor’s cache  The cache remains largely unaffected by the garbage collection  The coprocessor ensures cache coherency  Inspects and selectively flushes single cache lines through a dedicated cache port (resembles snoop port)  The coprocessor eliminates unnecessary memory traffic  Invalidates all cache lines with dead objects at the end of a garbage collection cycle © TLB GmbH, Karlsruhe 2012
  • 6.
    A Fully FunctionalPrototype Exists And Has Been Used For Performance Measurements  Main processor & GC Coprocessor modeled at register transfer level in VHDL, synthesized for Altera APEX 20K1000C (@ 25MHz)  Pipelined RISC processor, statically scheduled  up to 3 instructions per clock cycle (3-way multiple issue, “in order”)  16 pointer registers, 16 data registers, 8 Praedikatregister  8K execution cache, 8K data cache, 2K attribute cache  two-way set-associative copy-back cache  Micro-coded garbage collection coprocessor  256 x 80 bit on-chip microcode memory  Uses less than 20% of the chip surface area  Software  Native Java bytecode compiler developed for the architecture. An included code scheduler rearranges instructions to take advantage of the processor’s parallel execution units and to hide instruction latencies  Subset of the Java class libraries supporting text-based apps in order to facilitate the execution of representative programs (includes NFS client) © TLB GmbH, Karlsruhe 2012
  • 7.
    An Experimental ComputerSystem Was Assembled Based On The Garbage-Collection Processor © TLB GmbH, Karlsruhe 2012
  • 8.
    Pauses Caused ByGarbage Collection Do Not Exceed 500 Clock Cycles Frequency distribution of synchronization pauses (shown for javac) Frequency Pause Duration in Clock Cycles © TLB GmbH, Karlsruhe 2012
  • 9.
    The Runtime OverheadFor The Hardware-Based Garbage Collection Is Small © TLB GmbH, Karlsruhe 2012
  • 10.
    The Advantages OfThis Approach Could Enable Real-Time Garbage Collection in Embedded Systems  Limits pauses from garbage collection to 500 clock cycles  Efficient synchronization  No code overhead  Low total runtime overhead of only a few percent  Undisturbed cache locality  Exact (non-conservative) garbage collection  Compiler & code are independent from garbage collector © TLB GmbH, Karlsruhe 2012
  • 11.
    BACKUP © TLB GmbH, Karlsruhe 2012
  • 12.
    Efficient Implementation © TLB GmbH, Karlsruhe 2012
  • 13.
    Coprocessor Architecture © TLB GmbH, Karlsruhe 2012
  • 14.
    Synchronization I © TLB GmbH, Karlsruhe 2012
  • 15.
    Synchronization II © TLB GmbH, Karlsruhe 2012
  • 16.
    Synchronization III © TLB GmbH, Karlsruhe 2012
  • 17.
    Synchronization IV © TLB GmbH, Karlsruhe 2012
  • 18.
    The Runtime OverheadIs Small - I © TLB GmbH, Karlsruhe 2012
  • 19.
    The Runtime OverheadIs Small - II © TLB GmbH, Karlsruhe 2012
  • 20.
    The Runtime OverheadIs Small - III © TLB GmbH, Karlsruhe 2012