Heterogeneous Multiprocessing
with Android on NXP i.MX 7
Laura Nao, Nicola La Gloria
Kynetics, Santa Clara, CA
About us.
● Kynetics is full software stack engineering firm
○ Embedded Unit
○ Application Unit
● We support NXP embedded application processors
● Custom Android OS for different industries.
○ Kernel development
○ API Device Integration (HAL)
○ Custom system services (native, Java)
● Continuous building and delivery of artifacts
Outline (1/2)
1. Introduction to AMP
○ SMP vs AMP
○ i.MX7 overview
○ OpenAMP framework
2. RPMsg in Android kernel
○ RPMsg character driver overview
○ Implementation in the Linux kernel
3. Android porting on Colibri i.MX7
○ Kynetics’ Cohesys BSP for Colibri i.MX 7
SMP vs AMP
SMP on homogeneous architectures:
● Single OS controlling two or
more identical cores sharing
system resources
● Dynamic scheduling and load
balancing
. . .
App App
OS
Coren
Core1
...
Kernel SMP
Outline (2/2)
4. Cohesys AMP demo - Headless mode
○ Cohesys Android/FreeRTOS demo - Goal
○ Cohesys Android/FreeRTOS demo - HW Setup
○ IMU data sampling (FreeRTOS)
○ Android native IPC client
5. Android AMP demo - Headful mode
○ Android App overview
○ Java bridge to native IPC library (JNI)
○ GUI
6. Hands-on videos
7. Q&A
SMP vs AMP
AMP on heterogeneous
architectures:
● Different OS on each core -->
full-featured OS alongside a
real-time kernel
● Inter processor communication
protocol
● Efficient when the application
can be statically partitioned
across cores - high performance
is achieved locally
. . .
App App Task Task
OS OS/RTOS
Coren
Core1
...
MCAPI
Why Heterogeneous Systems?
A growing number of embedded systems require concurrent execution in
segregated environments:
● Real time performances to access certain devices/peripherals
● Power consumption (MCU + MPU systems were used in the past)
○ Data aggregation from sensors
● System integrity: segregation (Rich OS + critical subsystem)
○ Multi-chip approach
○ Virtualization
○ HMP (Heterogeneous multiprocessing) ←
● Cortex-A7 core + Cortex-M4 core
● Master - Slave architecture
○ A7 is the master
○ M4 is the slave
● Inter processor communication
○ MU - Messaging Unit
○ RPMsg component (OpenAMP framework)
● Safe sharing of I/O resources
○ RDC - Resource Domain Controller
NXP i.MX7 overview
i.MX7 Reference Manual: https://www.nxp.com/docs/en/reference-manual/IMX7DRM.pdf
Why Embedded Android
● Very application oriented: abstraction between low level hardware and
application layers
● Rich UI SDK
○ Native (NDK)
○ Java (SDK)
● Great debugging tools
● Productive development environment
○ Android Studio
○ Gradle based build system
● Almost any java developer can be an “embedded application developer”
OpenAMP framework: Inter-processor communication
RPMsg
VirtIO/Virtqueue
Shared memory
Inter-core interrupts
RPMsg Lite,
OpenAMP Rpmsg,
...
VirtIO, Virtqueue, Vring
Shmem, MU, Mailbox
Transport Layer
MAC Layer
Physical Layer
OpenAMP framework: MAC (VirtIO)
virtqueue
struct vring
short used_idx
short avail_idx
Int (*add_buff)(..)
void*(*get_buff)(..)
void(*kick)(..)
vring_desc
vring_desc
vring_desc
vring_desc
vring_desc
vring_avail
vring_used
...
VRING Buffer list
Buffer
Buffer
Buffer
Shared Memory
VirtIO Communication
Master (A7) transmit to Remote (M4)
● Master get_buff() from virtqueue1
○ get idx from USED ring
● Master fills the buffer
● Master add_buff() to the virtqueue1
○ write buffer idx in AVAIL ring and increment idx
● Remote get_buff() from AVAIL ring
○ Remote add_buff() to USED ring (freed)
● Master writes buffer idx to USED ring and increment idx
Master (A7) receives from Remote (M4)
● Master get_buff() from virtqueue2
○ get idx from USED ring tail
● Master add_buff() to the virtqueue2
○ write buffer idx AVAIL in ring and
increment
● Remote get_buff() from AVAIL ring
and fills the buffer
○ Remote add_ buff() to USED
ring and increment
● Master get_buff() from USED ring
OpenAMP framework - RPMsg channels and endpoints
RPMsg character driver
The Linux RPMsg char driver exposes RPMsg endpoints to user-space processes.
● Supports the creation of multiple endpoints for each RPMsg device
● Each created endpoint device shows up as a single character device in /dev
● Provides multiple interfaces:
○ Control interface: allows creation/destruction of endpoint interfaces
○ Endpoint interface (one for each exposed endpoint): allows creation, destruction
and interaction with endpoints
The driver was first introduced in the Linux 4.11 version (sources can be found in the drivers
folder of mainline kernel). More info are available in our technical note.
RPMsg character driver
Implementation in Linux Kernel
Cohesys BSP
Board Support Package for Toradex Colibri-iMX7 SoM:
● Android 7.1.2
● U-Boot 2017.03 (from NXP) + support for .ELF files
● Linux Kernel 4.9 + RPMsg character driver backported from Kernel 4.11
This build is compatible with:
● Colibri i.MX7 eMMC SOM 1GB RAM
● Toradex Iris carrier board
● 7” capacitive parallel display from Toradex
Hybrid Android/FreeRTOS Demo - Goal
● FreeRTOS binary running on Cortex-M4
○ Sample IMU sensor
○ Send data upon configuration:
➢ VECTOR mode - raw acc, mag, gyro data
➢ NORM mode - norm of acc, mag, gyro vectors
● Android executable running on Cortex-A7 [i.e. “headless” mode]
○ Check inter-core communication and log received data on a text file
● Android app running on Cortex-A7 [i.e. “headful” mode]
○ Sensor data plotting
Hybrid Android/FreeRTOS Demo - HW Setup
● Toradex Iris Carrier Board w/
Colibri i.MX7 SOM
● Adafruit Precision NXP 9-DOF
Breakout Board (via I2C)
○ FXOS8700 3-Axis
accelerometer and
magnetometer
○ FXAS21002 3-axis
gyroscope
Architecture Overview
TXT
IMU data sampling (FreeRTOS app)
TXT
IMU data sampling (FreeRTOS app)
TXT
Android native IPC client
TXT
https://youtu.be/LjGJndErk8g
Headless Demo Video
Android app overview
JNI - native to Java code
Java: impossibility of interacting directly with the hardware
JNI: Glue layer between Java and the lower layers of the OS
● Provides support for interacting with native code like C/C++
● Map native methods which interact directly with the hardware
● Java code declares static native methods in whatever class in the code
● Main Activity loads the native libraries (.so or .dll) where native methods are
implemented (in C) and bind them to the class where they have been
declared (native).
Android OS and JNI
Picture by Karim Yaghmour
Native IPC library (JNI)
Motivation: The Android app needs to
interact with the control interface exposed
by the RPMsg char driver:
● Endpoint creation requires ioctl
operation on the control interface
● Ioctl operations cannot be done
from Java code
Activity
JNI wrapper
Native library
Android kernel
Rpmsg char driver
UI app
Linux process
● Low level operation on RPMsg devices
(e.g. creating/destroying endpoints) are
handled by native C methods.
JNI endpoint creation
#define RPMSG_CREATE_EPT_IOCTL _IOW(0xb5, 0x1, struct rpmsg_endpoint_info)
/* Open controller device */
fd_ctrldev = open(/dev/rpmsg_ctrl0, O_RDONLY);
/* Create endpoint device */
ret = ioctl(fd_ctrldev, RPMSG_CREATE_EPT_IOCTL, &ep);
if (ret < 0) {
__android_log_print(ANDROID_LOG_INFO, "openDeviceNative", "Error creating endpoint device: %s
n",strerror(errno));
close(fd_ctrldev);
return NULL;
}
GUI
Plotting libraries used: https://github.com/PhilJay/MPAndroidChart
● VECTOR mode: plots raw values of
the three components (i.e. x, y, z) of
respectively the acc, mag and gyro
vectors.
● NORM mode : plots the norm values
of the acc, mag and gyro vectors.
There is one plot for each sensor.
● NORM mode is selected by default
during application startup.
I/O Data Rate
Remote core:
● Sample IMUs every 10ms - 100 Hz
● Buffer of 300 elements = 3Kb (TCM Memory is only 32 Kb - bigger buffer is possible if
application is moved to DDR)
○ In NORM mode each element is 12 byte (3 float * 4 bytes each float)
○ In VECTOR mode each element is 36 byte
● Items are dequeued and sent to master 10 at a time every 100 ms
○ In NORM mode sending speed is 1.32KB/s (with RPMSG header)
○ In VECTOR mode sending speed is 3.67KB/s (with RPMSG header)
Master core:
● At the driver layer
○ In NORM mode receiving speed is ~0.93KB/s (without RPMSG header)
○ In VECTOR mode receiving speed is ~3.51KB/s (without RPMSG header)
Headfull Demo Video
https://youtu.be/2u6bOJbrFW0
https://youtu.be/D5Dh9G9JB18
Setup Demo Video
References
● Kynetics Technical Notes: http://kynetics.com/docs
○ Android Asymmetric Multiprocessing on Toradex Colibri i.MX7D
○ RPMsg device and driver on Linux and Android
○ Android Asymmetric Multiprocessing on i.MX7: Remote Core Sensors Data
Streaming in Java
● Kynetics GitHub: https://github.com/kynetics
● OpenAMP project page
● An Introduction to Asymmetric Multiprocessing: When this Architecture can be a Game
Changer (ELC 2018)
Q&A

Heterogeneous multiprocessing on androd and i.mx7

  • 1.
    Heterogeneous Multiprocessing with Androidon NXP i.MX 7 Laura Nao, Nicola La Gloria Kynetics, Santa Clara, CA
  • 2.
    About us. ● Kyneticsis full software stack engineering firm ○ Embedded Unit ○ Application Unit ● We support NXP embedded application processors ● Custom Android OS for different industries. ○ Kernel development ○ API Device Integration (HAL) ○ Custom system services (native, Java) ● Continuous building and delivery of artifacts
  • 3.
    Outline (1/2) 1. Introductionto AMP ○ SMP vs AMP ○ i.MX7 overview ○ OpenAMP framework 2. RPMsg in Android kernel ○ RPMsg character driver overview ○ Implementation in the Linux kernel 3. Android porting on Colibri i.MX7 ○ Kynetics’ Cohesys BSP for Colibri i.MX 7
  • 4.
    SMP vs AMP SMPon homogeneous architectures: ● Single OS controlling two or more identical cores sharing system resources ● Dynamic scheduling and load balancing . . . App App OS Coren Core1 ... Kernel SMP
  • 5.
    Outline (2/2) 4. CohesysAMP demo - Headless mode ○ Cohesys Android/FreeRTOS demo - Goal ○ Cohesys Android/FreeRTOS demo - HW Setup ○ IMU data sampling (FreeRTOS) ○ Android native IPC client 5. Android AMP demo - Headful mode ○ Android App overview ○ Java bridge to native IPC library (JNI) ○ GUI 6. Hands-on videos 7. Q&A
  • 6.
    SMP vs AMP AMPon heterogeneous architectures: ● Different OS on each core --> full-featured OS alongside a real-time kernel ● Inter processor communication protocol ● Efficient when the application can be statically partitioned across cores - high performance is achieved locally . . . App App Task Task OS OS/RTOS Coren Core1 ... MCAPI
  • 7.
    Why Heterogeneous Systems? Agrowing number of embedded systems require concurrent execution in segregated environments: ● Real time performances to access certain devices/peripherals ● Power consumption (MCU + MPU systems were used in the past) ○ Data aggregation from sensors ● System integrity: segregation (Rich OS + critical subsystem) ○ Multi-chip approach ○ Virtualization ○ HMP (Heterogeneous multiprocessing) ←
  • 8.
    ● Cortex-A7 core+ Cortex-M4 core ● Master - Slave architecture ○ A7 is the master ○ M4 is the slave ● Inter processor communication ○ MU - Messaging Unit ○ RPMsg component (OpenAMP framework) ● Safe sharing of I/O resources ○ RDC - Resource Domain Controller NXP i.MX7 overview i.MX7 Reference Manual: https://www.nxp.com/docs/en/reference-manual/IMX7DRM.pdf
  • 9.
    Why Embedded Android ●Very application oriented: abstraction between low level hardware and application layers ● Rich UI SDK ○ Native (NDK) ○ Java (SDK) ● Great debugging tools ● Productive development environment ○ Android Studio ○ Gradle based build system ● Almost any java developer can be an “embedded application developer”
  • 10.
    OpenAMP framework: Inter-processorcommunication RPMsg VirtIO/Virtqueue Shared memory Inter-core interrupts RPMsg Lite, OpenAMP Rpmsg, ... VirtIO, Virtqueue, Vring Shmem, MU, Mailbox Transport Layer MAC Layer Physical Layer
  • 11.
    OpenAMP framework: MAC(VirtIO) virtqueue struct vring short used_idx short avail_idx Int (*add_buff)(..) void*(*get_buff)(..) void(*kick)(..) vring_desc vring_desc vring_desc vring_desc vring_desc vring_avail vring_used ... VRING Buffer list Buffer Buffer Buffer Shared Memory
  • 12.
    VirtIO Communication Master (A7)transmit to Remote (M4) ● Master get_buff() from virtqueue1 ○ get idx from USED ring ● Master fills the buffer ● Master add_buff() to the virtqueue1 ○ write buffer idx in AVAIL ring and increment idx ● Remote get_buff() from AVAIL ring ○ Remote add_buff() to USED ring (freed) ● Master writes buffer idx to USED ring and increment idx Master (A7) receives from Remote (M4) ● Master get_buff() from virtqueue2 ○ get idx from USED ring tail ● Master add_buff() to the virtqueue2 ○ write buffer idx AVAIL in ring and increment ● Remote get_buff() from AVAIL ring and fills the buffer ○ Remote add_ buff() to USED ring and increment ● Master get_buff() from USED ring
  • 13.
    OpenAMP framework -RPMsg channels and endpoints
  • 14.
    RPMsg character driver TheLinux RPMsg char driver exposes RPMsg endpoints to user-space processes. ● Supports the creation of multiple endpoints for each RPMsg device ● Each created endpoint device shows up as a single character device in /dev ● Provides multiple interfaces: ○ Control interface: allows creation/destruction of endpoint interfaces ○ Endpoint interface (one for each exposed endpoint): allows creation, destruction and interaction with endpoints The driver was first introduced in the Linux 4.11 version (sources can be found in the drivers folder of mainline kernel). More info are available in our technical note.
  • 15.
  • 16.
    Cohesys BSP Board SupportPackage for Toradex Colibri-iMX7 SoM: ● Android 7.1.2 ● U-Boot 2017.03 (from NXP) + support for .ELF files ● Linux Kernel 4.9 + RPMsg character driver backported from Kernel 4.11 This build is compatible with: ● Colibri i.MX7 eMMC SOM 1GB RAM ● Toradex Iris carrier board ● 7” capacitive parallel display from Toradex
  • 17.
    Hybrid Android/FreeRTOS Demo- Goal ● FreeRTOS binary running on Cortex-M4 ○ Sample IMU sensor ○ Send data upon configuration: ➢ VECTOR mode - raw acc, mag, gyro data ➢ NORM mode - norm of acc, mag, gyro vectors ● Android executable running on Cortex-A7 [i.e. “headless” mode] ○ Check inter-core communication and log received data on a text file ● Android app running on Cortex-A7 [i.e. “headful” mode] ○ Sensor data plotting
  • 18.
    Hybrid Android/FreeRTOS Demo- HW Setup ● Toradex Iris Carrier Board w/ Colibri i.MX7 SOM ● Adafruit Precision NXP 9-DOF Breakout Board (via I2C) ○ FXOS8700 3-Axis accelerometer and magnetometer ○ FXAS21002 3-axis gyroscope
  • 19.
  • 20.
    IMU data sampling(FreeRTOS app) TXT
  • 21.
    IMU data sampling(FreeRTOS app) TXT
  • 22.
  • 23.
  • 24.
  • 25.
    JNI - nativeto Java code Java: impossibility of interacting directly with the hardware JNI: Glue layer between Java and the lower layers of the OS ● Provides support for interacting with native code like C/C++ ● Map native methods which interact directly with the hardware ● Java code declares static native methods in whatever class in the code ● Main Activity loads the native libraries (.so or .dll) where native methods are implemented (in C) and bind them to the class where they have been declared (native).
  • 26.
    Android OS andJNI Picture by Karim Yaghmour
  • 27.
    Native IPC library(JNI) Motivation: The Android app needs to interact with the control interface exposed by the RPMsg char driver: ● Endpoint creation requires ioctl operation on the control interface ● Ioctl operations cannot be done from Java code Activity JNI wrapper Native library Android kernel Rpmsg char driver UI app Linux process ● Low level operation on RPMsg devices (e.g. creating/destroying endpoints) are handled by native C methods.
  • 28.
    JNI endpoint creation #defineRPMSG_CREATE_EPT_IOCTL _IOW(0xb5, 0x1, struct rpmsg_endpoint_info) /* Open controller device */ fd_ctrldev = open(/dev/rpmsg_ctrl0, O_RDONLY); /* Create endpoint device */ ret = ioctl(fd_ctrldev, RPMSG_CREATE_EPT_IOCTL, &ep); if (ret < 0) { __android_log_print(ANDROID_LOG_INFO, "openDeviceNative", "Error creating endpoint device: %s n",strerror(errno)); close(fd_ctrldev); return NULL; }
  • 29.
    GUI Plotting libraries used:https://github.com/PhilJay/MPAndroidChart ● VECTOR mode: plots raw values of the three components (i.e. x, y, z) of respectively the acc, mag and gyro vectors. ● NORM mode : plots the norm values of the acc, mag and gyro vectors. There is one plot for each sensor. ● NORM mode is selected by default during application startup.
  • 30.
    I/O Data Rate Remotecore: ● Sample IMUs every 10ms - 100 Hz ● Buffer of 300 elements = 3Kb (TCM Memory is only 32 Kb - bigger buffer is possible if application is moved to DDR) ○ In NORM mode each element is 12 byte (3 float * 4 bytes each float) ○ In VECTOR mode each element is 36 byte ● Items are dequeued and sent to master 10 at a time every 100 ms ○ In NORM mode sending speed is 1.32KB/s (with RPMSG header) ○ In VECTOR mode sending speed is 3.67KB/s (with RPMSG header) Master core: ● At the driver layer ○ In NORM mode receiving speed is ~0.93KB/s (without RPMSG header) ○ In VECTOR mode receiving speed is ~3.51KB/s (without RPMSG header)
  • 31.
  • 32.
  • 33.
    References ● Kynetics TechnicalNotes: http://kynetics.com/docs ○ Android Asymmetric Multiprocessing on Toradex Colibri i.MX7D ○ RPMsg device and driver on Linux and Android ○ Android Asymmetric Multiprocessing on i.MX7: Remote Core Sensors Data Streaming in Java ● Kynetics GitHub: https://github.com/kynetics ● OpenAMP project page ● An Introduction to Asymmetric Multiprocessing: When this Architecture can be a Game Changer (ELC 2018)
  • 34.