SlideShare a Scribd company logo
1 of 17
Download to read offline
Implementing Concurrency Abstractions
                     for Programming
       Multi-Core Embedded Systems
                            in Scheme

                                           Ruben Vandamme


  Promotor: Prof. Dr. Wolfgang De Meuter

   Advisors: Dr. Coen De Roover
             Christophe Scholliers
2
                                     Overview

    ξ€Š
        Embedded systems
    ξ€Š
        Event-driven XMOS chip
    ξ€Š
        Interpreter requirements
    ξ€Š
        Bit Scheme
    ξ€Š
        Modifying Bit Scheme to support XMOS
    ξ€Š
        Demonstration
    ξ€Š
        Contributions & Conclusion
3
                          Embedded software
    ξ€Š
        Increasingly important
        ●
            Digital watches, microwaves, cars, etc
        ●
            98% of processors used are embedded
    ξ€Š
        Different from PC and server software
        ●
            Interacts with the outside world
        ●
            Reading sensors, buttons, communicating, etc
    ξ€Š
        Polling: frequently check condition
    ξ€Š
        Interrupts: asynchronous signals
        ●
            Less overhead, less power
4
                                              Interrupts
    ξ€Š
        Dedicated hardware for frequent tasks
        ●
            PWM, UART, IΒ²C,…
        ●
            Timing sensitive tasks
        ●
            Interrupts are often used as an interface
    ξ€Š
        Source of various bugs
        ●
            Stack overflow, interrupt overload, …
        ●
            John Regehr (2005)
              Safe and structured use of interrupts in real-time
              and embedded software
5
            Event-driven chip

    ξ€Š
        XMOS XS1-G4
        ●
            No interrupts or polling
    ξ€Š
        Multi-core, multi-threaded
        ●
            Threads supported in HW
        ●
            Guaranteed execution time
        ●
            Message passing
    ξ€Š
        Transputer
    ξ€Š
        Programmed in XC
        ●
            Based on CSP
6
                  Interpreter requirements
    ξ€Š
        Use less than 64 KB memory (per core)
        ●
            Most interpreters need an order of magnitude
            more memory
        ●
            MiniScheme, Pico, etc.
    ξ€Š
        Contain a real-time garbage collector
    ξ€Š
        We need to extend it to
        ●
            exploit concurrency provided by hardware
        ●
            export hardware functionality
              Input & output, timing, ...
7
                                    Bit Scheme
    ξ€Š
        Fits in 64KB memory
          Byte code + interpreter + runtime memory
    ξ€Š
        Byte code based
          Compiler can remove unneeded functions
    ξ€Š
        No runtime error handling
          Keeps interpreter small
    ξ€Š
        Real-time garbage collector
    ξ€Š
        Served as a basis for our XMOS Scheme
8
            Exploiting XMOS concurrency
    ξ€Š
        We run four interpreters run in parallel
        ●
            One on each core
        ●
            Modified compiler and interpreter accordingly
    ξ€Š
        Exploit all memory and IO possibilities
                                 (par
                                  (core CORE_0
                                    ...)
                                  (core CORE_1
                                    ...)
                                  (core CORE_2
                                    ...)
                                  (core CORE_3
                                    ...))
9
  Communication primitives added
 ξ€Š
     Use message passing over channels
     ●
         cout, cin
     ●
         Primitives use hardware
 ξ€Š
     Doesn't support composite types
     ●
         Serialization needed
                                Core 0   Core 1
 (par
  (core CORE_0
    (cout CORE_1 99))
  (core CORE_1
    (display (cin CORE_0))))    Core 2   Core 3
10
                       IO primitives added
                   ξ€Š
                       Initialize and configure IO
                         pon, poff, pconf_in, pconf_out
                   ξ€Š
                       Perform IO
                         pout, pin
                   ξ€Š
                       Wait for an event on IO pins
                         peq, pneq
                   ξ€Š
                       Use hardware functionality
(define PORT_CLOCKLED 525056)
(pon PORT_CLOCKLED)
(pconf_out PORT_CLOCKLED 0)
(pout PORT_CLOCKLED 15)
11
                          Time primitives added

     ξ€Š
         We added a notion of time
         ●
             Execute actions at a certain point in time
         ●
             (timer)
               Returns the current time in clockticks
         ●
             (after time)
               Blocks thread until current time is after time
         ●
             Both primitives call hardware functionality
               (define now (timer))
               (define clock 100000000)
               (after (+ now (βˆ— 5 clock)))
               (display ”5 seconds later”)
Handling multiple events at once
12


 ξ€Š
     Certain primitives are blocking
       pne, peq, cin
 ξ€Š
     Threads need to be able to handle more
     than one event at a time
      (select
       ((select_pne buttons 15)
         (lambda (buttonsvalue)
           (display buttonsvalue)))
       ((select_cin CORE_1)
         display)
       (else (display ”default”)))
13
                                        Compilation

     (par
       (core CORE_0 ...)          BC0
       (core CORE_1 ...)          BC1
       (core CORE_2 ...)          BC2
       (core CORE_3 ...)          BC3
     )

     Step 1
     Scheme compiler
     compiles Scheme β†’ bytecode
14
                                                            Compilation

     (par                                                      BC0                     BC1
                                                            Interpreter             Interpreter




                                        Interpreter
       (core CORE_0 ...)          BC0
       (core CORE_1 ...)          BC1
       (core CORE_2 ...)          BC2
       (core CORE_3 ...)          BC3
     )                                                         BC2                     BC3
                                                            Interpreter             Interpreter
     Step 1
     Scheme compiler
     compiles Scheme β†’ bytecode                       Step2
                                                      XMOS toolchain
                                                      compiles bytecode + interpreter
                                                      β†’ XMOS executable
15
                                 Demonstration
     ξ€Š
         Case study
     ξ€Š
         LED Pulse Width Modulation in Scheme
     ξ€Š
         Communication over Xbee
          ●
              Via UART implemented in Scheme
          App           UART
         Buttons         RX




         UART
                        PWM
          TX
16
                Contributions & Conclusion
     ξ€Š
         Ported a Scheme interpreter to the new
         XMOS chip
     ξ€Š
         Exploit the concurrency of XMOS chip
     ξ€Š
         Added new primitives
         ●
             IO, message passing, time, …
     ξ€Š
         Allow to program hardware from Scheme
     ξ€Š
         Modified compiler accordingly
17




     Questions

More Related Content

What's hot

07 processor basics
07 processor basics07 processor basics
07 processor basicsMurali M
Β 
I3 multicore processor
I3 multicore processorI3 multicore processor
I3 multicore processorAmol Barewar
Β 
Linux Kernel Booting Process (1) - For NLKB
Linux Kernel Booting Process (1) - For NLKBLinux Kernel Booting Process (1) - For NLKB
Linux Kernel Booting Process (1) - For NLKBshimosawa
Β 
The Silence of the Canaries
The Silence of the CanariesThe Silence of the Canaries
The Silence of the CanariesKernel TLV
Β 
Inference accelerators
Inference acceleratorsInference accelerators
Inference acceleratorsDarshanG13
Β 
ν•œμ»΄MDS_Virtual Target Debugging with TRACE32
ν•œμ»΄MDS_Virtual Target Debugging with TRACE32ν•œμ»΄MDS_Virtual Target Debugging with TRACE32
ν•œμ»΄MDS_Virtual Target Debugging with TRACE32HANCOM MDS
Β 
Linux Porting
Linux PortingLinux Porting
Linux PortingChamp Yen
Β 
ζ·Ίθ«‡ζŽ’η΄’ Linux η³»η΅±θ¨­θ¨ˆδΉ‹ι“
ζ·Ίθ«‡ζŽ’η΄’ Linux η³»η΅±θ¨­θ¨ˆδΉ‹ι“ ζ·Ίθ«‡ζŽ’η΄’ Linux η³»η΅±θ¨­θ¨ˆδΉ‹ι“
ζ·Ίθ«‡ζŽ’η΄’ Linux η³»η΅±θ¨­θ¨ˆδΉ‹ι“ National Cheng Kung University
Β 
Kernel Debugging & Profiling
Kernel Debugging & ProfilingKernel Debugging & Profiling
Kernel Debugging & ProfilingAnil Kumar Pugalia
Β 
Linux User Space Debugging & Profiling
Linux User Space Debugging & ProfilingLinux User Space Debugging & Profiling
Linux User Space Debugging & ProfilingAnil Kumar Pugalia
Β 
Ov psim demo_slides_power_pc
Ov psim demo_slides_power_pcOv psim demo_slides_power_pc
Ov psim demo_slides_power_pcsimon56
Β 
Recent advance in netmap/VALE(mSwitch)
Recent advance in netmap/VALE(mSwitch)Recent advance in netmap/VALE(mSwitch)
Recent advance in netmap/VALE(mSwitch)micchie
Β 

What's hot (20)

07 processor basics
07 processor basics07 processor basics
07 processor basics
Β 
Synchronization
SynchronizationSynchronization
Synchronization
Β 
I3 multicore processor
I3 multicore processorI3 multicore processor
I3 multicore processor
Β 
Linux Kernel Booting Process (1) - For NLKB
Linux Kernel Booting Process (1) - For NLKBLinux Kernel Booting Process (1) - For NLKB
Linux Kernel Booting Process (1) - For NLKB
Β 
Embedded TCP/IP stack for FreeRTOS
Embedded TCP/IP stack for FreeRTOSEmbedded TCP/IP stack for FreeRTOS
Embedded TCP/IP stack for FreeRTOS
Β 
Xvisor: embedded and lightweight hypervisor
Xvisor: embedded and lightweight hypervisorXvisor: embedded and lightweight hypervisor
Xvisor: embedded and lightweight hypervisor
Β 
The Silence of the Canaries
The Silence of the CanariesThe Silence of the Canaries
The Silence of the Canaries
Β 
Inference accelerators
Inference acceleratorsInference accelerators
Inference accelerators
Β 
Processes
ProcessesProcesses
Processes
Β 
Signals
SignalsSignals
Signals
Β 
ν•œμ»΄MDS_Virtual Target Debugging with TRACE32
ν•œμ»΄MDS_Virtual Target Debugging with TRACE32ν•œμ»΄MDS_Virtual Target Debugging with TRACE32
ν•œμ»΄MDS_Virtual Target Debugging with TRACE32
Β 
Shell Scripting
Shell ScriptingShell Scripting
Shell Scripting
Β 
Linux Porting
Linux PortingLinux Porting
Linux Porting
Β 
ζ·Ίθ«‡ζŽ’η΄’ Linux η³»η΅±θ¨­θ¨ˆδΉ‹ι“
ζ·Ίθ«‡ζŽ’η΄’ Linux η³»η΅±θ¨­θ¨ˆδΉ‹ι“ ζ·Ίθ«‡ζŽ’η΄’ Linux η³»η΅±θ¨­θ¨ˆδΉ‹ι“
ζ·Ίθ«‡ζŽ’η΄’ Linux η³»η΅±θ¨­θ¨ˆδΉ‹ι“
Β 
Kernel Debugging & Profiling
Kernel Debugging & ProfilingKernel Debugging & Profiling
Kernel Debugging & Profiling
Β 
Linux User Space Debugging & Profiling
Linux User Space Debugging & ProfilingLinux User Space Debugging & Profiling
Linux User Space Debugging & Profiling
Β 
Ov psim demo_slides_power_pc
Ov psim demo_slides_power_pcOv psim demo_slides_power_pc
Ov psim demo_slides_power_pc
Β 
Recent advance in netmap/VALE(mSwitch)
Recent advance in netmap/VALE(mSwitch)Recent advance in netmap/VALE(mSwitch)
Recent advance in netmap/VALE(mSwitch)
Β 
Block Drivers
Block DriversBlock Drivers
Block Drivers
Β 
Video Drivers
Video DriversVideo Drivers
Video Drivers
Β 

Similar to 05 defense

Atmel and pic microcontroller
Atmel and pic microcontrollerAtmel and pic microcontroller
Atmel and pic microcontrollerTearsome Llantada
Β 
High Performance Computing Infrastructure: Past, Present, and Future
High Performance Computing Infrastructure: Past, Present, and FutureHigh Performance Computing Infrastructure: Past, Present, and Future
High Performance Computing Infrastructure: Past, Present, and Futurekarl.barnes
Β 
Unmanaged Parallelization via P/Invoke
Unmanaged Parallelization via P/InvokeUnmanaged Parallelization via P/Invoke
Unmanaged Parallelization via P/InvokeDmitri Nesteruk
Β 
isa architecture
isa architectureisa architecture
isa architectureAJAL A J
Β 
Introduction to FreeRTOS
Introduction to FreeRTOSIntroduction to FreeRTOS
Introduction to FreeRTOSICS
Β 
Heterogeneous multiprocessing on androd and i.mx7
Heterogeneous multiprocessing on androd and i.mx7Heterogeneous multiprocessing on androd and i.mx7
Heterogeneous multiprocessing on androd and i.mx7Kynetics
Β 
BSides LV 2016 - Beyond the tip of the iceberg - fuzzing binary protocols for...
BSides LV 2016 - Beyond the tip of the iceberg - fuzzing binary protocols for...BSides LV 2016 - Beyond the tip of the iceberg - fuzzing binary protocols for...
BSides LV 2016 - Beyond the tip of the iceberg - fuzzing binary protocols for...Alexandre Moneger
Β 
Hunting and Exploiting Bugs in Kernel Drivers - DefCamp 2012
Hunting and Exploiting Bugs in Kernel Drivers - DefCamp 2012Hunting and Exploiting Bugs in Kernel Drivers - DefCamp 2012
Hunting and Exploiting Bugs in Kernel Drivers - DefCamp 2012DefCamp
Β 
Skiron - Experiments in CPU Design in D
Skiron - Experiments in CPU Design in DSkiron - Experiments in CPU Design in D
Skiron - Experiments in CPU Design in DMithun Hunsur
Β 
Pysense: wireless sensor computing in Python?
Pysense: wireless sensor computing in Python?Pysense: wireless sensor computing in Python?
Pysense: wireless sensor computing in Python?Davide Carboni
Β 
Beneath the Linux Interrupt handling
Beneath the Linux Interrupt handlingBeneath the Linux Interrupt handling
Beneath the Linux Interrupt handlingBhoomil Chavda
Β 
Inter process communication using Linux System Calls
Inter process communication using Linux System CallsInter process communication using Linux System Calls
Inter process communication using Linux System Callsjyoti9vssut
Β 
Os Madsen Block
Os Madsen BlockOs Madsen Block
Os Madsen Blockoscon2007
Β 
Genode Compositions
Genode CompositionsGenode Compositions
Genode CompositionsVasily Sartakov
Β 
Why kernelspace sucks?
Why kernelspace sucks?Why kernelspace sucks?
Why kernelspace sucks?OpenFest team
Β 
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...Akihiro Hayashi
Β 
BPF Hardware Offload Deep Dive
BPF Hardware Offload Deep DiveBPF Hardware Offload Deep Dive
BPF Hardware Offload Deep DiveNetronome
Β 
An Enhanced FPGA Based Asynchronous Microprocessor Design Using VIVADO and ISIM
An Enhanced FPGA Based Asynchronous Microprocessor Design Using VIVADO and ISIMAn Enhanced FPGA Based Asynchronous Microprocessor Design Using VIVADO and ISIM
An Enhanced FPGA Based Asynchronous Microprocessor Design Using VIVADO and ISIMjournalBEEI
Β 
Lecture1 - Computer Architecture
Lecture1 - Computer ArchitectureLecture1 - Computer Architecture
Lecture1 - Computer ArchitectureVolodymyr Ushenko
Β 

Similar to 05 defense (20)

Atmel and pic microcontroller
Atmel and pic microcontrollerAtmel and pic microcontroller
Atmel and pic microcontroller
Β 
High Performance Computing Infrastructure: Past, Present, and Future
High Performance Computing Infrastructure: Past, Present, and FutureHigh Performance Computing Infrastructure: Past, Present, and Future
High Performance Computing Infrastructure: Past, Present, and Future
Β 
Unmanaged Parallelization via P/Invoke
Unmanaged Parallelization via P/InvokeUnmanaged Parallelization via P/Invoke
Unmanaged Parallelization via P/Invoke
Β 
isa architecture
isa architectureisa architecture
isa architecture
Β 
Introduction to FreeRTOS
Introduction to FreeRTOSIntroduction to FreeRTOS
Introduction to FreeRTOS
Β 
Mina2
Mina2Mina2
Mina2
Β 
Heterogeneous multiprocessing on androd and i.mx7
Heterogeneous multiprocessing on androd and i.mx7Heterogeneous multiprocessing on androd and i.mx7
Heterogeneous multiprocessing on androd and i.mx7
Β 
BSides LV 2016 - Beyond the tip of the iceberg - fuzzing binary protocols for...
BSides LV 2016 - Beyond the tip of the iceberg - fuzzing binary protocols for...BSides LV 2016 - Beyond the tip of the iceberg - fuzzing binary protocols for...
BSides LV 2016 - Beyond the tip of the iceberg - fuzzing binary protocols for...
Β 
Hunting and Exploiting Bugs in Kernel Drivers - DefCamp 2012
Hunting and Exploiting Bugs in Kernel Drivers - DefCamp 2012Hunting and Exploiting Bugs in Kernel Drivers - DefCamp 2012
Hunting and Exploiting Bugs in Kernel Drivers - DefCamp 2012
Β 
Skiron - Experiments in CPU Design in D
Skiron - Experiments in CPU Design in DSkiron - Experiments in CPU Design in D
Skiron - Experiments in CPU Design in D
Β 
Pysense: wireless sensor computing in Python?
Pysense: wireless sensor computing in Python?Pysense: wireless sensor computing in Python?
Pysense: wireless sensor computing in Python?
Β 
Beneath the Linux Interrupt handling
Beneath the Linux Interrupt handlingBeneath the Linux Interrupt handling
Beneath the Linux Interrupt handling
Β 
Inter process communication using Linux System Calls
Inter process communication using Linux System CallsInter process communication using Linux System Calls
Inter process communication using Linux System Calls
Β 
Os Madsen Block
Os Madsen BlockOs Madsen Block
Os Madsen Block
Β 
Genode Compositions
Genode CompositionsGenode Compositions
Genode Compositions
Β 
Why kernelspace sucks?
Why kernelspace sucks?Why kernelspace sucks?
Why kernelspace sucks?
Β 
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Exploring Compiler Optimization Opportunities for the OpenMP 4.x Accelerator...
Β 
BPF Hardware Offload Deep Dive
BPF Hardware Offload Deep DiveBPF Hardware Offload Deep Dive
BPF Hardware Offload Deep Dive
Β 
An Enhanced FPGA Based Asynchronous Microprocessor Design Using VIVADO and ISIM
An Enhanced FPGA Based Asynchronous Microprocessor Design Using VIVADO and ISIMAn Enhanced FPGA Based Asynchronous Microprocessor Design Using VIVADO and ISIM
An Enhanced FPGA Based Asynchronous Microprocessor Design Using VIVADO and ISIM
Β 
Lecture1 - Computer Architecture
Lecture1 - Computer ArchitectureLecture1 - Computer Architecture
Lecture1 - Computer Architecture
Β 

05 defense

  • 1. Implementing Concurrency Abstractions for Programming Multi-Core Embedded Systems in Scheme Ruben Vandamme Promotor: Prof. Dr. Wolfgang De Meuter Advisors: Dr. Coen De Roover Christophe Scholliers
  • 2. 2 Overview ξ€Š Embedded systems ξ€Š Event-driven XMOS chip ξ€Š Interpreter requirements ξ€Š Bit Scheme ξ€Š Modifying Bit Scheme to support XMOS ξ€Š Demonstration ξ€Š Contributions & Conclusion
  • 3. 3 Embedded software ξ€Š Increasingly important ● Digital watches, microwaves, cars, etc ● 98% of processors used are embedded ξ€Š Different from PC and server software ● Interacts with the outside world ● Reading sensors, buttons, communicating, etc ξ€Š Polling: frequently check condition ξ€Š Interrupts: asynchronous signals ● Less overhead, less power
  • 4. 4 Interrupts ξ€Š Dedicated hardware for frequent tasks ● PWM, UART, IΒ²C,… ● Timing sensitive tasks ● Interrupts are often used as an interface ξ€Š Source of various bugs ● Stack overflow, interrupt overload, … ● John Regehr (2005) Safe and structured use of interrupts in real-time and embedded software
  • 5. 5 Event-driven chip ξ€Š XMOS XS1-G4 ● No interrupts or polling ξ€Š Multi-core, multi-threaded ● Threads supported in HW ● Guaranteed execution time ● Message passing ξ€Š Transputer ξ€Š Programmed in XC ● Based on CSP
  • 6. 6 Interpreter requirements ξ€Š Use less than 64 KB memory (per core) ● Most interpreters need an order of magnitude more memory ● MiniScheme, Pico, etc. ξ€Š Contain a real-time garbage collector ξ€Š We need to extend it to ● exploit concurrency provided by hardware ● export hardware functionality Input & output, timing, ...
  • 7. 7 Bit Scheme ξ€Š Fits in 64KB memory Byte code + interpreter + runtime memory ξ€Š Byte code based Compiler can remove unneeded functions ξ€Š No runtime error handling Keeps interpreter small ξ€Š Real-time garbage collector ξ€Š Served as a basis for our XMOS Scheme
  • 8. 8 Exploiting XMOS concurrency ξ€Š We run four interpreters run in parallel ● One on each core ● Modified compiler and interpreter accordingly ξ€Š Exploit all memory and IO possibilities (par (core CORE_0 ...) (core CORE_1 ...) (core CORE_2 ...) (core CORE_3 ...))
  • 9. 9 Communication primitives added ξ€Š Use message passing over channels ● cout, cin ● Primitives use hardware ξ€Š Doesn't support composite types ● Serialization needed Core 0 Core 1 (par (core CORE_0 (cout CORE_1 99)) (core CORE_1 (display (cin CORE_0)))) Core 2 Core 3
  • 10. 10 IO primitives added ξ€Š Initialize and configure IO pon, poff, pconf_in, pconf_out ξ€Š Perform IO pout, pin ξ€Š Wait for an event on IO pins peq, pneq ξ€Š Use hardware functionality (define PORT_CLOCKLED 525056) (pon PORT_CLOCKLED) (pconf_out PORT_CLOCKLED 0) (pout PORT_CLOCKLED 15)
  • 11. 11 Time primitives added ξ€Š We added a notion of time ● Execute actions at a certain point in time ● (timer) Returns the current time in clockticks ● (after time) Blocks thread until current time is after time ● Both primitives call hardware functionality (define now (timer)) (define clock 100000000) (after (+ now (βˆ— 5 clock))) (display ”5 seconds later”)
  • 12. Handling multiple events at once 12 ξ€Š Certain primitives are blocking pne, peq, cin ξ€Š Threads need to be able to handle more than one event at a time (select ((select_pne buttons 15) (lambda (buttonsvalue) (display buttonsvalue))) ((select_cin CORE_1) display) (else (display ”default”)))
  • 13. 13 Compilation (par (core CORE_0 ...) BC0 (core CORE_1 ...) BC1 (core CORE_2 ...) BC2 (core CORE_3 ...) BC3 ) Step 1 Scheme compiler compiles Scheme β†’ bytecode
  • 14. 14 Compilation (par BC0 BC1 Interpreter Interpreter Interpreter (core CORE_0 ...) BC0 (core CORE_1 ...) BC1 (core CORE_2 ...) BC2 (core CORE_3 ...) BC3 ) BC2 BC3 Interpreter Interpreter Step 1 Scheme compiler compiles Scheme β†’ bytecode Step2 XMOS toolchain compiles bytecode + interpreter β†’ XMOS executable
  • 15. 15 Demonstration ξ€Š Case study ξ€Š LED Pulse Width Modulation in Scheme ξ€Š Communication over Xbee ● Via UART implemented in Scheme App UART Buttons RX UART PWM TX
  • 16. 16 Contributions & Conclusion ξ€Š Ported a Scheme interpreter to the new XMOS chip ξ€Š Exploit the concurrency of XMOS chip ξ€Š Added new primitives ● IO, message passing, time, … ξ€Š Allow to program hardware from Scheme ξ€Š Modified compiler accordingly
  • 17. 17 Questions