Implementing Concurrency Abstractions
                     for Programming
       Multi-Core Embedded Systems
                            in Scheme

                                           Ruben Vandamme


  Promotor: Prof. Dr. Wolfgang De Meuter

   Advisors: Dr. Coen De Roover
             Christophe Scholliers
2
                                     Overview

    
        Embedded systems
    
        Event-driven XMOS chip
    
        Interpreter requirements
    
        Bit Scheme
    
        Modifying Bit Scheme to support XMOS
    
        Demonstration
    
        Contributions & Conclusion
3
                          Embedded software
    
        Increasingly important
        ●
            Digital watches, microwaves, cars, etc
        ●
            98% of processors used are embedded
    
        Different from PC and server software
        ●
            Interacts with the outside world
        ●
            Reading sensors, buttons, communicating, etc
    
        Polling: frequently check condition
    
        Interrupts: asynchronous signals
        ●
            Less overhead, less power
4
                                              Interrupts
    
        Dedicated hardware for frequent tasks
        ●
            PWM, UART, I²C,…
        ●
            Timing sensitive tasks
        ●
            Interrupts are often used as an interface
    
        Source of various bugs
        ●
            Stack overflow, interrupt overload, …
        ●
            John Regehr (2005)
              Safe and structured use of interrupts in real-time
              and embedded software
5
            Event-driven chip

    
        XMOS XS1-G4
        ●
            No interrupts or polling
    
        Multi-core, multi-threaded
        ●
            Threads supported in HW
        ●
            Guaranteed execution time
        ●
            Message passing
    
        Transputer
    
        Programmed in XC
        ●
            Based on CSP
6
                  Interpreter requirements
    
        Use less than 64 KB memory (per core)
        ●
            Most interpreters need an order of magnitude
            more memory
        ●
            MiniScheme, Pico, etc.
    
        Contain a real-time garbage collector
    
        We need to extend it to
        ●
            exploit concurrency provided by hardware
        ●
            export hardware functionality
              Input & output, timing, ...
7
                                    Bit Scheme
    
        Fits in 64KB memory
          Byte code + interpreter + runtime memory
    
        Byte code based
          Compiler can remove unneeded functions
    
        No runtime error handling
          Keeps interpreter small
    
        Real-time garbage collector
    
        Served as a basis for our XMOS Scheme
8
            Exploiting XMOS concurrency
    
        We run four interpreters run in parallel
        ●
            One on each core
        ●
            Modified compiler and interpreter accordingly
    
        Exploit all memory and IO possibilities
                                 (par
                                  (core CORE_0
                                    ...)
                                  (core CORE_1
                                    ...)
                                  (core CORE_2
                                    ...)
                                  (core CORE_3
                                    ...))
9
  Communication primitives added
 
     Use message passing over channels
     ●
         cout, cin
     ●
         Primitives use hardware
 
     Doesn't support composite types
     ●
         Serialization needed
                                Core 0   Core 1
 (par
  (core CORE_0
    (cout CORE_1 99))
  (core CORE_1
    (display (cin CORE_0))))    Core 2   Core 3
10
                       IO primitives added
                   
                       Initialize and configure IO
                         pon, poff, pconf_in, pconf_out
                   
                       Perform IO
                         pout, pin
                   
                       Wait for an event on IO pins
                         peq, pneq
                   
                       Use hardware functionality
(define PORT_CLOCKLED 525056)
(pon PORT_CLOCKLED)
(pconf_out PORT_CLOCKLED 0)
(pout PORT_CLOCKLED 15)
11
                          Time primitives added

     
         We added a notion of time
         ●
             Execute actions at a certain point in time
         ●
             (timer)
               Returns the current time in clockticks
         ●
             (after time)
               Blocks thread until current time is after time
         ●
             Both primitives call hardware functionality
               (define now (timer))
               (define clock 100000000)
               (after (+ now (∗ 5 clock)))
               (display ”5 seconds later”)
Handling multiple events at once
12


 
     Certain primitives are blocking
       pne, peq, cin
 
     Threads need to be able to handle more
     than one event at a time
      (select
       ((select_pne buttons 15)
         (lambda (buttonsvalue)
           (display buttonsvalue)))
       ((select_cin CORE_1)
         display)
       (else (display ”default”)))
13
                                        Compilation

     (par
       (core CORE_0 ...)          BC0
       (core CORE_1 ...)          BC1
       (core CORE_2 ...)          BC2
       (core CORE_3 ...)          BC3
     )

     Step 1
     Scheme compiler
     compiles Scheme → bytecode
14
                                                            Compilation

     (par                                                      BC0                     BC1
                                                            Interpreter             Interpreter




                                        Interpreter
       (core CORE_0 ...)          BC0
       (core CORE_1 ...)          BC1
       (core CORE_2 ...)          BC2
       (core CORE_3 ...)          BC3
     )                                                         BC2                     BC3
                                                            Interpreter             Interpreter
     Step 1
     Scheme compiler
     compiles Scheme → bytecode                       Step2
                                                      XMOS toolchain
                                                      compiles bytecode + interpreter
                                                      → XMOS executable
15
                                 Demonstration
     
         Case study
     
         LED Pulse Width Modulation in Scheme
     
         Communication over Xbee
          ●
              Via UART implemented in Scheme
          App           UART
         Buttons         RX




         UART
                        PWM
          TX
16
                Contributions & Conclusion
     
         Ported a Scheme interpreter to the new
         XMOS chip
     
         Exploit the concurrency of XMOS chip
     
         Added new primitives
         ●
             IO, message passing, time, …
     
         Allow to program hardware from Scheme
     
         Modified compiler accordingly
17




     Questions

05 defense

  • 1.
    Implementing Concurrency Abstractions for Programming Multi-Core Embedded Systems in Scheme Ruben Vandamme Promotor: Prof. Dr. Wolfgang De Meuter Advisors: Dr. Coen De Roover Christophe Scholliers
  • 2.
    2 Overview  Embedded systems  Event-driven XMOS chip  Interpreter requirements  Bit Scheme  Modifying Bit Scheme to support XMOS  Demonstration  Contributions & Conclusion
  • 3.
    3 Embedded software  Increasingly important ● Digital watches, microwaves, cars, etc ● 98% of processors used are embedded  Different from PC and server software ● Interacts with the outside world ● Reading sensors, buttons, communicating, etc  Polling: frequently check condition  Interrupts: asynchronous signals ● Less overhead, less power
  • 4.
    4 Interrupts  Dedicated hardware for frequent tasks ● PWM, UART, I²C,… ● Timing sensitive tasks ● Interrupts are often used as an interface  Source of various bugs ● Stack overflow, interrupt overload, … ● John Regehr (2005) Safe and structured use of interrupts in real-time and embedded software
  • 5.
    5 Event-driven chip  XMOS XS1-G4 ● No interrupts or polling  Multi-core, multi-threaded ● Threads supported in HW ● Guaranteed execution time ● Message passing  Transputer  Programmed in XC ● Based on CSP
  • 6.
    6 Interpreter requirements  Use less than 64 KB memory (per core) ● Most interpreters need an order of magnitude more memory ● MiniScheme, Pico, etc.  Contain a real-time garbage collector  We need to extend it to ● exploit concurrency provided by hardware ● export hardware functionality Input & output, timing, ...
  • 7.
    7 Bit Scheme  Fits in 64KB memory Byte code + interpreter + runtime memory  Byte code based Compiler can remove unneeded functions  No runtime error handling Keeps interpreter small  Real-time garbage collector  Served as a basis for our XMOS Scheme
  • 8.
    8 Exploiting XMOS concurrency  We run four interpreters run in parallel ● One on each core ● Modified compiler and interpreter accordingly  Exploit all memory and IO possibilities (par (core CORE_0 ...) (core CORE_1 ...) (core CORE_2 ...) (core CORE_3 ...))
  • 9.
    9 Communicationprimitives added  Use message passing over channels ● cout, cin ● Primitives use hardware  Doesn't support composite types ● Serialization needed Core 0 Core 1 (par (core CORE_0 (cout CORE_1 99)) (core CORE_1 (display (cin CORE_0)))) Core 2 Core 3
  • 10.
    10 IO primitives added  Initialize and configure IO pon, poff, pconf_in, pconf_out  Perform IO pout, pin  Wait for an event on IO pins peq, pneq  Use hardware functionality (define PORT_CLOCKLED 525056) (pon PORT_CLOCKLED) (pconf_out PORT_CLOCKLED 0) (pout PORT_CLOCKLED 15)
  • 11.
    11 Time primitives added  We added a notion of time ● Execute actions at a certain point in time ● (timer) Returns the current time in clockticks ● (after time) Blocks thread until current time is after time ● Both primitives call hardware functionality (define now (timer)) (define clock 100000000) (after (+ now (∗ 5 clock))) (display ”5 seconds later”)
  • 12.
    Handling multiple eventsat once 12  Certain primitives are blocking pne, peq, cin  Threads need to be able to handle more than one event at a time (select ((select_pne buttons 15) (lambda (buttonsvalue) (display buttonsvalue))) ((select_cin CORE_1) display) (else (display ”default”)))
  • 13.
    13 Compilation (par (core CORE_0 ...) BC0 (core CORE_1 ...) BC1 (core CORE_2 ...) BC2 (core CORE_3 ...) BC3 ) Step 1 Scheme compiler compiles Scheme → bytecode
  • 14.
    14 Compilation (par BC0 BC1 Interpreter Interpreter Interpreter (core CORE_0 ...) BC0 (core CORE_1 ...) BC1 (core CORE_2 ...) BC2 (core CORE_3 ...) BC3 ) BC2 BC3 Interpreter Interpreter Step 1 Scheme compiler compiles Scheme → bytecode Step2 XMOS toolchain compiles bytecode + interpreter → XMOS executable
  • 15.
    15 Demonstration  Case study  LED Pulse Width Modulation in Scheme  Communication over Xbee ● Via UART implemented in Scheme App UART Buttons RX UART PWM TX
  • 16.
    16 Contributions & Conclusion  Ported a Scheme interpreter to the new XMOS chip  Exploit the concurrency of XMOS chip  Added new primitives ● IO, message passing, time, …  Allow to program hardware from Scheme  Modified compiler accordingly
  • 17.
    17 Questions