Multi-threaded RTOS
How Multi-threading can increase
on-chip parallelism
Outline
 Introduction
 Multi-threading models
 Architectures of multi-threaded processors
 Simultaneous multi-threading and multi-
processors
 Cache design
 Examples of Multi-threaded environments
 Conclusions
Introduction
 Two forms of parallelism
 instruction-level parallelism (ILP)
 thread-level parallelism (TLP)
 Both identify independent instructions that can execute in parallel
 Wide-issue superscalar processors exploit ILP by executing multiple
instructions from a single program in a single cycle.
 Multiprocessors exploit TLP by executing different threads in parallel
on different processors.
 The first multi-threaded processor approaches in the 1970s and
1980s applied multi-threading at user-thread-level to solve the
memory access latency problem.
Introduction
 Motivations for multi-threaded processor architecture development
include chip area , cost and complexity.
 Simultaneous Multi-threading (SMT),
 Single chip multiprocessing (CMP),
 SMT VLIW architecture,
 Multithreaded Vector (SMV) architecture
 DSP applications inherently benefit from the following architectural
characteristics:
 Parallelization at multiple levels of hierarchy:
 - Instruction - separate instruction memory space
 - Data – separate date memory space
 - Thread- multiple functional units
 - Data transfer – multiple wide data buses
Vertical and Horizontal Waste
 Vertical waste is
introduced when the
processor issues no
instructions in a cycle
 Horizontal waste when
not all issue slots can
be filled in a cycle.
Vertical and Horizontal Waste
Multi-threaded Models
 Fine-Grain Multithreading
 Only one thread issues instructions
each cycle, but it can use the entire
issue width of the processor.
 SM: full Simultaneous Issue
 Single
 Dual
 Four
 SM: limited Connection
 Hardware context is connected
directly one of each type of
functional units.
 Less dynamic
Performance
SMT VLIW Architecture
Simultaneous Vector Multi-threaded Architecture (SVMT)
SMT vs. Multiprocessing
Cache design
Examples Multi-threaded RTOS
 Analog Devices VDK
 uClinux
 The RTXC Quadros RTOS
 RTCX/ss
 RTXC/ss
 ThreadX
Conclusions
 A simultaneous multithreaded architecture is superior in
performance to a multiple-issue multiprocessor (multi-issue CMP).
 SMT boost utilization by dynamically scheduling functional units
among multiple threads.
 SMT also increases hardware design flexibility.
 Simultaneous multithreading increases the complexity of instruction
scheduling.
 Increased parallelism offered makes multi-threading ideal for DSP
applications where each application can run as a separate thread.

Multi threaded rtos

  • 1.
    Multi-threaded RTOS How Multi-threadingcan increase on-chip parallelism
  • 2.
    Outline  Introduction  Multi-threadingmodels  Architectures of multi-threaded processors  Simultaneous multi-threading and multi- processors  Cache design  Examples of Multi-threaded environments  Conclusions
  • 3.
    Introduction  Two formsof parallelism  instruction-level parallelism (ILP)  thread-level parallelism (TLP)  Both identify independent instructions that can execute in parallel  Wide-issue superscalar processors exploit ILP by executing multiple instructions from a single program in a single cycle.  Multiprocessors exploit TLP by executing different threads in parallel on different processors.  The first multi-threaded processor approaches in the 1970s and 1980s applied multi-threading at user-thread-level to solve the memory access latency problem.
  • 4.
    Introduction  Motivations formulti-threaded processor architecture development include chip area , cost and complexity.  Simultaneous Multi-threading (SMT),  Single chip multiprocessing (CMP),  SMT VLIW architecture,  Multithreaded Vector (SMV) architecture  DSP applications inherently benefit from the following architectural characteristics:  Parallelization at multiple levels of hierarchy:  - Instruction - separate instruction memory space  - Data – separate date memory space  - Thread- multiple functional units  - Data transfer – multiple wide data buses
  • 5.
    Vertical and HorizontalWaste  Vertical waste is introduced when the processor issues no instructions in a cycle  Horizontal waste when not all issue slots can be filled in a cycle.
  • 6.
  • 7.
    Multi-threaded Models  Fine-GrainMultithreading  Only one thread issues instructions each cycle, but it can use the entire issue width of the processor.  SM: full Simultaneous Issue  Single  Dual  Four  SM: limited Connection  Hardware context is connected directly one of each type of functional units.  Less dynamic
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
    Examples Multi-threaded RTOS Analog Devices VDK  uClinux  The RTXC Quadros RTOS  RTCX/ss  RTXC/ss  ThreadX
  • 14.
    Conclusions  A simultaneousmultithreaded architecture is superior in performance to a multiple-issue multiprocessor (multi-issue CMP).  SMT boost utilization by dynamically scheduling functional units among multiple threads.  SMT also increases hardware design flexibility.  Simultaneous multithreading increases the complexity of instruction scheduling.  Increased parallelism offered makes multi-threading ideal for DSP applications where each application can run as a separate thread.