SlideShare a Scribd company logo
1 of 42
Download to read offline
ARM FDPIC Toolchain and ABI
2015, September 24th
Linaro Connect San Francisco 2015
STMicroelectronics Compilation Group: C. Bertin, M. Guêné, C. Monat
11:15 Motivation Christian Bertin
11:20 Definitions
11:25 The FDPIC ABI
11:40 ARM FDPIC Toolchain
11:55 Conclusion/Perspectives
12:00 Questions and Answers
Agenda
Time
Speaker
Presentation
2
Motivation
•  ARM MMU-less cores can run “Embedded” Linux
•  Kernel configured with NO-MMU in menuconfig
•  µClibc library used as userland C library
•  Multi-threading support (NPTL) with Thread Local Storage (TLS)
•  MMU-less Linux requested from our Automotive and Microcontroller divisions
•  Embedded applications can benefit from all Linux functionalities
•  Coping with strong memory requirements on overall system
•  Kernel and userland can be compiled in Thumb mode if possible
•  µClibc contributes to smaller footprint
•  Shared libraries are needed to run Linux applications and bring important savings
by regrouping read-only sections (executable code and R/O data) into sharable
“text” PIC segments
•  Risk of memory fragmentation must be mitigated by the application
3
Motivation (cont’d)
•  Linux uses disjoint (virtual) address spaces for its processes
•  For all processes, the text and data segments can be grouped in a way that they
can always be loaded with a fixed relative position to each other.
•  In all load modules, the offset from the start of the text segment to the start of the
data segment is the same (constant).
•  This property greatly simplifies the design of shared library machinery and is a
foundation of the ARM PIC ABI for Linux shared libraries.
•  W/o MMU, all processes share the same (physical) address space
•  To benefit from share-able text among multiple processes, text and data
segments are placed at arbitrary locations relative to each other.
•  There is a need for a mechanism whereby executable code will always be able to
find its corresponding data.
•  This fairly complicates the design of shared library machinery and leads to the
definition of the ARM FDPIC ABI for MMU-less Linux shared libraries.
4
Definitions
Definitions : PIC 6
•  Position Independent Code (PIC) executes regardless of its load
address in memory without modification by the loader
•  Commonly used for shared libraries (Dynamic Shared Objects : DSOs)
•  Enable code sharing
•  Executables in general are not entirely PIC
•  But sometimes they are Position Independent Executable (PIE), i.e. binaries
entirely made from PIC code (Security PaX or ExecShield)
•  PIE is more costly than non-PIE
•  PIC differs from relocatable : it requires NO processing of the text
segment to make it suitable for running at any location
PIC properties
Definitions : PIC 7
•  Position-independence imposes the following requirements on code:
•  All branches must be PC-relative
•  Function calls are done through entries in the Procedure Linkage Table (PLT)
•  Note : mandatory for extern or pre-emptible functions
•  Code that forms an absolute address referring to any address in a DSO text
segment is forbidden : it would have to be relocated at load-time, making it non-
sharable
•  All references to the data segment and to constants and literals in the text
segment must be PC-relative
•  Distance between text and data segments in a module can be constant
•  Code that references symbols that are or may be imported from other load modules
must use indirect addressing through a table, the Global Offset Table (GOT)
•  Indirect load and stores must be used for items that may be dynamically bound. In
both cases the indirection is done through the GOT allocated by the linker and
initialized by the dynamic loader
PIC code requirements
Definitions : Dynamic Loader/Linker 8
•  The dynamic loader/linker is a component of the system that:
•  locates all load modules belonging to an application
•  loads them into memory
•  binds the symbolic references amongst them by resolving ‘relocations’
From kernel to running application
Kernel
loads:
Application
Dynamic loader/
linker :
ld-uClibc.so
Dynamic
loader:
Application
libc.so
libfoo.so
ld-uClibc.so
Dynamic
linker:
Application
• Patch data
libc.so
• Patch data
libfoo.so
• Patch data
ld-uClibc.so
Control to
main startup
Application
libc.so
libfoo.so
ld-uClibc.so
Definitions : Pre-emption and visibility 9
•  Because of symbol pre-emption, the final linkage value of a global
symbol might not be known until run time
•  Thus, defined symbols that are considered bound at link time must
have the appropriate visibility attributes set: either protected or hidden.
Compiler and assembler define attributes to set these properties.
This enables code generation optimization.
ELF visibility attributes
Symbol Attribute Visible Pre-emptible
“default” : locals No No
“default” : globals, weaks Yes Yes
“protected” Yes No
“hidden” No No
“internal” (processor specific) No No
The FDPIC ABI
FDPIC ABI 11
•  Absence of MMU means:
•  Processes share a common address space : ‘text’ and ‘data’ relative positions are
arbitrary. Data segment address cannot be rematerialized with a PC-relative offset.
•  Need a mechanism for executable code to find its matching data:
•  A dedicated register, the FDPIC register, always holds the start address of the data
segment of the current load module.
•  Non-constant offset between code and data also complicates function
pointer representation
•  A function pointer is represented as the address of a Function Descriptor
composed of two words:
•  The address of the actual code to execute
•  The corresponding data segment address (to be loaded into the FDPIC register)
MMU-less environment PIC constraints
FDPIC ABI 12
•  As both code and data must be relocatable, executable code shall not
contain any instruction containing a pointer value
•  Pointers to global data are indirectly referenced by the Global Offset
Table (GOT)
•  At load-time, pointers contained in the GOT are relocated by the
dynamic linker to point to their actual location
Pure PIC constraints
FDPIC ABI 13
•  The FDPIC register :
•  is used as a base register for accessing the Global Offset Table (GOT) and the
Function Descriptor Table (FUNCDESC) of a given load module. The GOT and
FUNCDESC tables are usually mapped at the start of the data segment
•  is defined to be R9
•  is treated as caller-saved register (as opposed to callee-saved)
•  R9 is considered to be platform-specific in the ARM ABI
•  A caller function must save R9 before doing the actual call either on
the stack or in one of the callee-saved registers: R4-R8, R10, R11,
R13
•  A caller must restore R9 after the call,
•  R9 in == R9 out
The FDPIC register
FDPIC ABI 14
•  Upon entry to a function, the FDPIC register (R9) contains the GOT
address of the load module, where the function sits
•  The FDPIC register can be materialized in three ways:
1.  by being inherited from the calling function in the case of a direct call to a function
within the same ‘load module’
2.  by being set either in a PLT entry or in inlined PLT code
3.  by being set from a function descriptor as part of an indirect function call
•  A Procedure Linkage Table entry contains the code to establish the
new R9 value, retrieve the address of the actual code and branch to it
•  The function alone would have no means to materialize its own FDPIC value.
FDPIC register materialization
FDPIC ABI 15
•  Making direct calls to functions external to a given ‘load module’
requires a PLT entry (thunking)
•  This inter-linkage code establishes the FDPIC value of the targeted function
•  It would not be possible in the prologue of the function
•  The PLT entry contains
1.  instructions for setting the FDPIC register, fetching the function’s start address
and branching to it
2.  the offset of a function descriptor associated with the function
•  The function descriptor is located in the caller’s FUNCDESC table at
a fixed offset from the address specified by the FDPIC register of the
caller
Procedure Linkage Table a.k.a. PLT
FDPIC ABI 16
•  A PLT entry may look like this (ip is the intra-procedure call scratch
register) :
•  A call to a function will look like this
•  A PLT entry for foo is part of .plt sections of all modules calling foo.
A function descriptor for foo is allocated in FDTs of all foo callers .
Procedure Linkage Table entry code
plt(foo): ldr ip, .L1 # foo’s descriptor offset
add ip, ip, r9 # from caller’s FDT
ldr r9, [ip, #4] # foo’s data address
ldr pc, [ip] # foo’s code address
L1. .word foo(GOTOFFFUNCDESC)
bl foo(PLT)
Use of PLT and FUNCDESC (global) 17
LM1 text segment
data segment
r9
bl foo(PLT) pc
Calling foo in another load module
foo @
foo’s GP
LM2 text segment
GOT
GOT
PLT’s foo
foo’s code
foo(GOTOFFFUNCDESC) FUNCDESC
FUNCDESC
pc’
r9’
FDPIC ABI 18
•  The GOT contains words that hold the location of global data
•  To access global data:
•  code must use an FDPIC-relative load to fetch the data address from the GOT
•  the data item is then accessed using the address obtained from the GOT
•  The link editor is responsible for determining the precise GOT layout
•  The GOT initialization is performed at load time by the dynamic linker
•  Actually the dynamic linker resolves relocations
Global Offset Table a.k.a. GOT
Reading global data 19
•  Usual FDPIC sequence:
Data foo is actually global
ldr r2, .L3 # Load GOT offset
ldr r3, [r9, r2] # Add FDPIC register and load foo’s @
ldr r3, [r3, #0] # Read foo
.L4
.align 2
.L3
.word foo(GOT)
•  The static linker will allocate memory into GOT area to hold foo’s
address in all modules that access foo.
•  It will replace foo(GOT) by an offset between the start of GOT and the
GOT entry that contains foo’s address
Use of GOT for global access 20
_GLOBAL_OFFSET_TABLE_
GOT
@foo
LM text segment
data segment
r9
foo’s address offset
foo(GOT)
pc
foo
Data item foo is in LM or in another load module
Reading global data 21
•  Usual FDPIC sequence:
bar is actually in the load module (static, protected, hidden, internal)
ldr r3, .L3 # Load bar offset from b/o GOT
add r3, r3, r9 # Add FDPIC register to create bar’s @
ldr r3, [r3, #0] # Read bar
.L4
.align 2
.L3
.word bar(GOTOFF)
•  The static linker will allocate memory into GOT area to hold bar’s
address
•  It will replace bar(GOTOFF) by the offset between start of GOT and
bar’s address
Use of R9 for local access 22
GOT
text segment
data segment
r9
bar’s offset from PC
pc
bar
bar’s offset from GP
bar(GOTOFF)
Data item bar is in the data segment of the load module
Taking the address of a global function 23
•  FDPIC sequence:
• The static linker will replace GOTFUNCDESC with the offset
between GOT and the entry that contains function descriptor address.
Function foo is actually global
ldr r3, .L3 # Load GOT offset
ldr r3, [r3, r9] # Read foo’s function descriptor @ :
# FDPIC register + offset
.L4
.align 2
.L3
.word foo(GOTFUNCDESC)
Use of GOT and FUNCDESC (global) 24
LM1 text segment
data segment
r9
foo’s descr. offset
pc
function foo is in another load module
foo’s desc. @
foo @
foo’s GP
LM2 text segment
GOT
GOT
PLT’s foo
foo’s code
foo(GOTFUNCDESC)
FUNCDESC
Taking the address of a local function 25
•  Usual FDPIC sequence:
•  The static linker will allocate a descriptor for bar function and replace
GOTOFFFUNCDESC by the offset between the GOT address and
the address of bar’s function descriptor
function bar is actually local (static, protected, hidden, internal)
ldr r0, .L3 # Load bar’s descriptor entry offset
.LPIC0:
add r0, r0, r9 # Create bar’s FUNCDESC address
# from FDPIC register
.L4
.align 2
.L3
.word bar(GOTOFFFUNCDESC)
Use of GOT and FUNCDESC (local) 26
text segment
data segment
r9
bar’s descriptor offset
bar(GOTOFFFUNCDESC)
pc
bar is local and its function descriptor is in the FUNCDESC table
FUNCDESC
bar’s desc. @
GOT
bar@
bar’s GP
bar’s code
Direct jumping into a function
•  Usual PIC sequence:
27
•  FDPIC sequence:
bl myFunction
push r9
bl myFunction # may go thru PLT
pop r9
Indirect jumping : function given as a
parameter
•  PIC Usual sequence ( r0 : function
@ ):
28
•  FDPIC sequence (r0 : function descriptor @) :
…
blx r0
…
…
ldr r3, [r0, #0] # Read function @
mov r4, r9 # Save FDPIC reg
ldr r9, [r0, #4] # Read module
# FDPIC value
blx r3 # Call
mov r9, r4 # Restore FDPIC reg
…
FDPIC ABI 29
Same parameter passing conventions but calling conventions are
different and incompatible:
•  the GOT address must be materialized in r9 upon function entry
•  Mechanisms used for accessing global data are different and
incompatible
•  Function pointers are represented as 2-word function descriptors
•  ‘text’ and ‘data’ segments may be relocated by different amounts
•  The ARM FDPIC ABI adds specific relocations
Summary of differences vs. ARM PIC ABI
ARM Linux FDPIC Toolchains
for ARM mmu-less Cortex-M and -R cores
Credit: Mickael Guêné (ST ARM compilation team)
ARM FDPIC Toolchain
•  FDPIC toolchains exist for Blackfin, FR-V, Kalray processors
•  ST has implemented the FDPIC ABI in GNU toolchains for ARM v7-
R (CR4), ARM-v7-M (CM4) in Thumb and ARM modes.
•  It generates code suitable for a Linux MMU-less execution environment
•  It supports code generation for:
•  Kernel compilation
•  Shared libraries and dynamically linked binaries
•  Fully static binaries
•  It gives access to Linux functionality thru µclibc library, w/ multi-threading (NPTL,
TLS), but limited fork() due to µclibc.
•  It supports debugging of dynamically linked application with shared libs
•  It supports userland emulation with qemu, proot and ARMv7 R and M Linux rootfs
•  ST implemented the few kernel patches, which allow the kernel to
load and correctly handle FDPIC binaries and DSOs.
31
At a glance
FDPIC ABI Toolchain Support 32
•  The compiler is configured by default to generate FDPIC code
•  The –mfdpic option is implicit in the compiler configuration
•  The __FDPIC__ cpp macro is defined
•  Note that the __PIC__ macro is also defined since PIC code is generated
•  No change required in Linux applications build systems vs. FLAT or BFLAT
•  The –mno-fdpic option can be used to disable the FDPIC code generation
•  This is mandatory to build the Linux kernel
•  It is possible to mix static and dynamic libraries and link your applications with them
•  Use –Wl,-Bstatic –lfoo –Wl,-Bdynamic -lbar in the linker command
•  It is possible to generate FDPIC static applications
•  Use –static in the linker command
•  All libraries can be linked in the final application
•  Such a static binary can be also loaded at any arbitrary address
•  LAZY binding is disabled, since it cannot be supported in a thread-safe way
•  Race condition on non-atomic 64-bit load of function descriptor on ARM
Driver options
ARM FDPIC Toolchain
•  ARM v7-R toolchain in production for Cortex R4 in an Automotive
application
•  ARM v7-M for Cortex M4 as mature as CR4 but not in production yet
•  Both toolchains build a root file system with BuildRoot, which allows
to boot Linux and run BusyBox and mplayer.
•  proot, qemu and ARM v7-R and –M available rootfs images allow to
execute an ARM Linux application on an Intel Linux workstation
33
Maturity
ARM FDPIC Toolchain 34
Component Version Patches
Kernel 3.10-3.12 •  Loader upgrade for FDPIC binaries
•  ptrace extension (loadmap service + addresses of
data and code segments)
•  signal handlers (2 word address pointers)
•  futex for MMU-less
GCC 4.7.3 •  Large patch for FDPIC code generation
•  4.8 prototype
Binutils 2.22 •  New relocations in BFD
•  Large generic patch in the linker
GDB 7.5 •  server + client
•  Handling of dynamic objects, FDPIC awareness
µclibc library April 2012 •  generic patch in dynamic linker,
•  start-up code, relocations, ARM architecture
dependencies
qemu 1.5 •  Same work in loader as in kernel
Components
Conclusions, Perspectives
What’s next?
•  Source code is available on GitHub, not upstreamed yet:
•  https://github.com/mickael-guene/fdpic_manifest
•  For each component, two Git branches have been defined one for
Cortex-R and one for Cortex-M
•  Plan A, if community interest, we would have to follow these steps:
•  Upstream kernel patches. They have not been submitted to kernel.org, just one
bug entered on futex in mmu-less context
•  Merge source modification to gcc, binutils, gdb, uclibc, qemu and parameterize
machine dependencies (difficulty with binutils and uclibc)
•  Merge binary toolchain to support both ARM Cortex-M and –R with multilib
•  Investigate a replacement of uclibc with glibc or Musl C (http://www.musl-libc.org)
•  Will it be possible to fix lazy procedure linkage support?
•  Upgrade to newer versions of components for upstream
•  Plan B: Share and maintain current sources on GitHub
36
Conclusion
References 37
•  GNU binutils documentation
•  http://www.gnu.org/software/binutils
•  ARM ABI Specification
•  http://infocenter.arm.com/help/topic/com.arm.doc.ihi0042e/IHI0042E_aapcs.pdf
•  Relocations
•  http://www.mindfruit.co.uk/2012/06/relocations-relocations.html
•  DSOs engineering
•  http://www.akkadia.org/drepper/dsohowto.pdf
Further reading
Questions, Comments
Reading global data 39
•  Usual PIC sequence:
Data is actually global
ldr r2, .L3 # Load offset from PC to GOT
.LPIC0
add r2, pc, r2 # Build GOT address
ldr r3, .L3+4 # Load offset of foo entry in the GOT
ldr r3, [r2, r3] # Read foo’s address
ldr r3, [r3] # Read foo’s value
.L4
.align 2
.L3
.word _GLOBAL_OFFSET_TABLE_-(.LPIC0+8)
.word foo(GOT)
Reading global data 40
•  Usual PIC sequence:
Data is actually in the load module (static, protected, hidden, internal)
ldr r3, .L7 # Load bar offset from ins. below
.LPIC1:
add r3, pc, r3 # Load bar’s address
ldr r3, [r3, #0] # Read bar
.L4
.align 2
.L7
.word bar-(.LPIC1+8)
Taking the address of a global function 41
•  Usual PIC sequence:
Function foo is actually global
ldr r3, .L3 # Load global table offset
.LPIC0:
add r3, pc, r3 # Compute GOT address
ldr r2, .L3+4 # Load foo entry offset
ldr r3, [r3, r2] # Read foo’s address
.L4
.align 2
.L3
.word _GLOBAL_OFFSET_TABLE_-(.LPIC0+8)
.word foo(GOT)
Taking the address of a local function 42
•  Usual PIC sequence:
function bar is actually local (static, protected, hidden, internal)
ldr r0, .L3 # Read offset between here and bar @
.LPIC0:
add r0, pc, r0 # Create bar’s address from PC
.L4
.align 2
.L3
.word bar-(.LPIC0+8)

More Related Content

What's hot

High-Performance Networking Using eBPF, XDP, and io_uring
High-Performance Networking Using eBPF, XDP, and io_uringHigh-Performance Networking Using eBPF, XDP, and io_uring
High-Performance Networking Using eBPF, XDP, and io_uring
ScyllaDB
 

What's hot (20)

FPGAによる大規模データ処理の高速化
FPGAによる大規模データ処理の高速化FPGAによる大規模データ処理の高速化
FPGAによる大規模データ処理の高速化
 
How to hide your browser 0-days
How to hide your browser 0-daysHow to hide your browser 0-days
How to hide your browser 0-days
 
Continguous Memory Allocator in the Linux Kernel
Continguous Memory Allocator in the Linux KernelContinguous Memory Allocator in the Linux Kernel
Continguous Memory Allocator in the Linux Kernel
 
Linux Serial Driver
Linux Serial DriverLinux Serial Driver
Linux Serial Driver
 
CPU / GPU高速化セミナー!性能モデルの理論と実践:理論編
CPU / GPU高速化セミナー!性能モデルの理論と実践:理論編CPU / GPU高速化セミナー!性能モデルの理論と実践:理論編
CPU / GPU高速化セミナー!性能モデルの理論と実践:理論編
 
LinuxIO-Introduction-FUDCon-2015
LinuxIO-Introduction-FUDCon-2015LinuxIO-Introduction-FUDCon-2015
LinuxIO-Introduction-FUDCon-2015
 
Go로 새 프로젝트 시작하기
Go로 새 프로젝트 시작하기Go로 새 프로젝트 시작하기
Go로 새 프로젝트 시작하기
 
Token Bonding Workshop at Dappcon 18
Token Bonding Workshop at Dappcon 18Token Bonding Workshop at Dappcon 18
Token Bonding Workshop at Dappcon 18
 
GPUをJavaで使う話(Java Casual Talks #1)
GPUをJavaで使う話(Java Casual Talks #1)GPUをJavaで使う話(Java Casual Talks #1)
GPUをJavaで使う話(Java Casual Talks #1)
 
Pythonで動かして学ぶ機械学習入門_予測モデルを作ってみよう
Pythonで動かして学ぶ機械学習入門_予測モデルを作ってみようPythonで動かして学ぶ機械学習入門_予測モデルを作ってみよう
Pythonで動かして学ぶ機械学習入門_予測モデルを作ってみよう
 
High-Performance Networking Using eBPF, XDP, and io_uring
High-Performance Networking Using eBPF, XDP, and io_uringHigh-Performance Networking Using eBPF, XDP, and io_uring
High-Performance Networking Using eBPF, XDP, and io_uring
 
NeXTBSD aka FreeBSD X
NeXTBSD aka FreeBSD XNeXTBSD aka FreeBSD X
NeXTBSD aka FreeBSD X
 
Linux Programming
Linux ProgrammingLinux Programming
Linux Programming
 
Intel AVX-512/富岳SVE用SIMDコード生成ライブラリsimdgen
Intel AVX-512/富岳SVE用SIMDコード生成ライブラリsimdgenIntel AVX-512/富岳SVE用SIMDコード生成ライブラリsimdgen
Intel AVX-512/富岳SVE用SIMDコード生成ライブラリsimdgen
 
あるRISC-V CPUの 浮動小数点数(異常なし)
あるRISC-V CPUの 浮動小数点数(異常なし)あるRISC-V CPUの 浮動小数点数(異常なし)
あるRISC-V CPUの 浮動小数点数(異常なし)
 
自作かな漢字変換「Genji」をつくったよ
自作かな漢字変換「Genji」をつくったよ自作かな漢字変換「Genji」をつくったよ
自作かな漢字変換「Genji」をつくったよ
 
SCOMと管理パックで DBや仮想環境の監視のお悩みを解決
SCOMと管理パックでDBや仮想環境の監視のお悩みを解決SCOMと管理パックでDBや仮想環境の監視のお悩みを解決
SCOMと管理パックで DBや仮想環境の監視のお悩みを解決
 
LISA2019 Linux Systems Performance
LISA2019 Linux Systems PerformanceLISA2019 Linux Systems Performance
LISA2019 Linux Systems Performance
 
Veriloggen: Pythonによるハードウェアメタプログラミング(第3回 高位合成友の会 @ドワンゴ)
Veriloggen: Pythonによるハードウェアメタプログラミング(第3回 高位合成友の会 @ドワンゴ)Veriloggen: Pythonによるハードウェアメタプログラミング(第3回 高位合成友の会 @ドワンゴ)
Veriloggen: Pythonによるハードウェアメタプログラミング(第3回 高位合成友の会 @ドワンゴ)
 
Linux Kernel - Virtual File System
Linux Kernel - Virtual File SystemLinux Kernel - Virtual File System
Linux Kernel - Virtual File System
 

Viewers also liked

141 deview 2013 발표자료(박준형) v1.1(track4-session1)
141 deview 2013 발표자료(박준형) v1.1(track4-session1)141 deview 2013 발표자료(박준형) v1.1(track4-session1)
141 deview 2013 발표자료(박준형) v1.1(track4-session1)
NAVER D2
 
Program Structure in GNU/Linux (ELF Format)
Program Structure in GNU/Linux (ELF Format)Program Structure in GNU/Linux (ELF Format)
Program Structure in GNU/Linux (ELF Format)
Varun Mahajan
 
SECR'13 Lightweight linux shared libraries profiling
SECR'13 Lightweight linux shared libraries profilingSECR'13 Lightweight linux shared libraries profiling
SECR'13 Lightweight linux shared libraries profiling
OSLL
 

Viewers also liked (16)

ARM cortex A15
ARM cortex A15ARM cortex A15
ARM cortex A15
 
141 deview 2013 발표자료(박준형) v1.1(track4-session1)
141 deview 2013 발표자료(박준형) v1.1(track4-session1)141 deview 2013 발표자료(박준형) v1.1(track4-session1)
141 deview 2013 발표자료(박준형) v1.1(track4-session1)
 
Exploiting arm linux
Exploiting arm linuxExploiting arm linux
Exploiting arm linux
 
Arm cortex R(real time)processor series
Arm cortex R(real time)processor series Arm cortex R(real time)processor series
Arm cortex R(real time)processor series
 
Program Structure in GNU/Linux (ELF Format)
Program Structure in GNU/Linux (ELF Format)Program Structure in GNU/Linux (ELF Format)
Program Structure in GNU/Linux (ELF Format)
 
Address Binding Scheme
Address Binding SchemeAddress Binding Scheme
Address Binding Scheme
 
Execution
ExecutionExecution
Execution
 
Architecture and Implementation of the ARM Cortex-A8 Microprocessor
Architecture and Implementation of the ARM Cortex-A8 MicroprocessorArchitecture and Implementation of the ARM Cortex-A8 Microprocessor
Architecture and Implementation of the ARM Cortex-A8 Microprocessor
 
Compilation and Execution
Compilation and ExecutionCompilation and Execution
Compilation and Execution
 
SECR'13 Lightweight linux shared libraries profiling
SECR'13 Lightweight linux shared libraries profilingSECR'13 Lightweight linux shared libraries profiling
SECR'13 Lightweight linux shared libraries profiling
 
Linkers And Loaders
Linkers And LoadersLinkers And Loaders
Linkers And Loaders
 
Qualcomm Snapdragon Processor
Qualcomm Snapdragon ProcessorQualcomm Snapdragon Processor
Qualcomm Snapdragon Processor
 
ARM Processors
ARM ProcessorsARM Processors
ARM Processors
 
Q4.11: ARM Architecture
Q4.11: ARM ArchitectureQ4.11: ARM Architecture
Q4.11: ARM Architecture
 
A5 processor
A5 processorA5 processor
A5 processor
 
ARM Processor
ARM ProcessorARM Processor
ARM Processor
 

Similar to SFO15-406: ARM FDPIC toolset, kernel & libraries for Cortex-M & Cortex-R mmuless cores

Session 9 advance_verification_features
Session 9 advance_verification_featuresSession 9 advance_verification_features
Session 9 advance_verification_features
Nirav Desai
 
F5 link controller
F5  link controllerF5  link controller
F5 link controller
Jimmy Saigon
 
CICS TS 5.1 Loader Domain enhancements
CICS TS 5.1 Loader Domain enhancementsCICS TS 5.1 Loader Domain enhancements
CICS TS 5.1 Loader Domain enhancements
Larry Lawler
 

Similar to SFO15-406: ARM FDPIC toolset, kernel & libraries for Cortex-M & Cortex-R mmuless cores (20)

loaders and linkers
 loaders and linkers loaders and linkers
loaders and linkers
 
Loader and Its types
Loader and Its typesLoader and Its types
Loader and Its types
 
Arm architecture chapter2_steve_furber
Arm architecture chapter2_steve_furberArm architecture chapter2_steve_furber
Arm architecture chapter2_steve_furber
 
SYBSC IT SEM IV EMBEDDED SYSTEMS UNIT IV Designing Embedded System with 8051...
SYBSC IT SEM IV EMBEDDED SYSTEMS UNIT IV  Designing Embedded System with 8051...SYBSC IT SEM IV EMBEDDED SYSTEMS UNIT IV  Designing Embedded System with 8051...
SYBSC IT SEM IV EMBEDDED SYSTEMS UNIT IV Designing Embedded System with 8051...
 
Embedded C.pptx
Embedded C.pptxEmbedded C.pptx
Embedded C.pptx
 
assem.ppt
assem.pptassem.ppt
assem.ppt
 
Session 9 advance_verification_features
Session 9 advance_verification_featuresSession 9 advance_verification_features
Session 9 advance_verification_features
 
Unit 2.1. cpu
Unit 2.1. cpuUnit 2.1. cpu
Unit 2.1. cpu
 
Embedded _c_
Embedded  _c_Embedded  _c_
Embedded _c_
 
4bit pc report[cse 08-section-b2_group-02]
4bit pc report[cse 08-section-b2_group-02]4bit pc report[cse 08-section-b2_group-02]
4bit pc report[cse 08-section-b2_group-02]
 
4bit PC report
4bit PC report4bit PC report
4bit PC report
 
newerahpc grid
newerahpc gridnewerahpc grid
newerahpc grid
 
F5 link controller
F5  link controllerF5  link controller
F5 link controller
 
TLDK - FD.io Sept 2016
TLDK - FD.io Sept 2016 TLDK - FD.io Sept 2016
TLDK - FD.io Sept 2016
 
CICS TS 5.1 Loader Domain enhancements
CICS TS 5.1 Loader Domain enhancementsCICS TS 5.1 Loader Domain enhancements
CICS TS 5.1 Loader Domain enhancements
 
Netlink-Optimization.pptx
Netlink-Optimization.pptxNetlink-Optimization.pptx
Netlink-Optimization.pptx
 
Lec15 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- EPIC VLIW
Lec15 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- EPIC VLIWLec15 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- EPIC VLIW
Lec15 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- EPIC VLIW
 
Microchip's PIC Micro Controller
Microchip's PIC Micro ControllerMicrochip's PIC Micro Controller
Microchip's PIC Micro Controller
 
Lecture 03 basics of pic
Lecture 03 basics of picLecture 03 basics of pic
Lecture 03 basics of pic
 
.NET Core, ASP.NET Core Course, Session 3
.NET Core, ASP.NET Core Course, Session 3.NET Core, ASP.NET Core Course, Session 3
.NET Core, ASP.NET Core Course, Session 3
 

More from Linaro

Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea GalloDeep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Linaro
 
HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018
Linaro
 
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Linaro
 
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Linaro
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
Linaro
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
Linaro
 
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse HypervisorHKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
Linaro
 
HKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMUHKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMU
Linaro
 
HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation
Linaro
 
HKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted bootHKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted boot
Linaro
 

More from Linaro (20)

Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea GalloDeep Learning Neural Network Acceleration at the Edge - Andrea Gallo
Deep Learning Neural Network Acceleration at the Edge - Andrea Gallo
 
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta VekariaArm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
Arm Architecture HPC Workshop Santa Clara 2018 - Kanta Vekaria
 
Huawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Huawei’s requirements for the ARM based HPC solution readiness - Joshua MoraHuawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
Huawei’s requirements for the ARM based HPC solution readiness - Joshua Mora
 
Bud17 113: distribution ci using qemu and open qa
Bud17 113: distribution ci using qemu and open qaBud17 113: distribution ci using qemu and open qa
Bud17 113: distribution ci using qemu and open qa
 
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
OpenHPC Automation with Ansible - Renato Golin - Linaro Arm HPC Workshop 2018
 
HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018HPC network stack on ARM - Linaro HPC Workshop 2018
HPC network stack on ARM - Linaro HPC Workshop 2018
 
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
It just keeps getting better - SUSE enablement for Arm - Linaro HPC Workshop ...
 
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
Intelligent Interconnect Architecture to Enable Next Generation HPC - Linaro ...
 
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
Yutaka Ishikawa - Post-K and Arm HPC Ecosystem - Linaro Arm HPC Workshop Sant...
 
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
Andrew J Younge - Vanguard Astra - Petascale Arm Platform for U.S. DOE/ASC Su...
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
 
HKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening KeynoteHKG18-100K1 - George Grey: Opening Keynote
HKG18-100K1 - George Grey: Opening Keynote
 
HKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP WorkshopHKG18-318 - OpenAMP Workshop
HKG18-318 - OpenAMP Workshop
 
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainlineHKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
HKG18-501 - EAS on Common Kernel 4.14 and getting (much) closer to mainline
 
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and allHKG18-315 - Why the ecosystem is a wonderful thing, warts and all
HKG18-315 - Why the ecosystem is a wonderful thing, warts and all
 
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse HypervisorHKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
HKG18- 115 - Partitioning ARM Systems with the Jailhouse Hypervisor
 
HKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMUHKG18-TR08 - Upstreaming SVE in QEMU
HKG18-TR08 - Upstreaming SVE in QEMU
 
HKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8MHKG18-113- Secure Data Path work with i.MX8M
HKG18-113- Secure Data Path work with i.MX8M
 
HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation HKG18-120 - Devicetree Schema Documentation and Validation
HKG18-120 - Devicetree Schema Documentation and Validation
 
HKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted bootHKG18-223 - Trusted FirmwareM: Trusted boot
HKG18-223 - Trusted FirmwareM: Trusted boot
 

Recently uploaded

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 

SFO15-406: ARM FDPIC toolset, kernel & libraries for Cortex-M & Cortex-R mmuless cores

  • 1. ARM FDPIC Toolchain and ABI 2015, September 24th Linaro Connect San Francisco 2015 STMicroelectronics Compilation Group: C. Bertin, M. Guêné, C. Monat
  • 2. 11:15 Motivation Christian Bertin 11:20 Definitions 11:25 The FDPIC ABI 11:40 ARM FDPIC Toolchain 11:55 Conclusion/Perspectives 12:00 Questions and Answers Agenda Time Speaker Presentation 2
  • 3. Motivation •  ARM MMU-less cores can run “Embedded” Linux •  Kernel configured with NO-MMU in menuconfig •  µClibc library used as userland C library •  Multi-threading support (NPTL) with Thread Local Storage (TLS) •  MMU-less Linux requested from our Automotive and Microcontroller divisions •  Embedded applications can benefit from all Linux functionalities •  Coping with strong memory requirements on overall system •  Kernel and userland can be compiled in Thumb mode if possible •  µClibc contributes to smaller footprint •  Shared libraries are needed to run Linux applications and bring important savings by regrouping read-only sections (executable code and R/O data) into sharable “text” PIC segments •  Risk of memory fragmentation must be mitigated by the application 3
  • 4. Motivation (cont’d) •  Linux uses disjoint (virtual) address spaces for its processes •  For all processes, the text and data segments can be grouped in a way that they can always be loaded with a fixed relative position to each other. •  In all load modules, the offset from the start of the text segment to the start of the data segment is the same (constant). •  This property greatly simplifies the design of shared library machinery and is a foundation of the ARM PIC ABI for Linux shared libraries. •  W/o MMU, all processes share the same (physical) address space •  To benefit from share-able text among multiple processes, text and data segments are placed at arbitrary locations relative to each other. •  There is a need for a mechanism whereby executable code will always be able to find its corresponding data. •  This fairly complicates the design of shared library machinery and leads to the definition of the ARM FDPIC ABI for MMU-less Linux shared libraries. 4
  • 6. Definitions : PIC 6 •  Position Independent Code (PIC) executes regardless of its load address in memory without modification by the loader •  Commonly used for shared libraries (Dynamic Shared Objects : DSOs) •  Enable code sharing •  Executables in general are not entirely PIC •  But sometimes they are Position Independent Executable (PIE), i.e. binaries entirely made from PIC code (Security PaX or ExecShield) •  PIE is more costly than non-PIE •  PIC differs from relocatable : it requires NO processing of the text segment to make it suitable for running at any location PIC properties
  • 7. Definitions : PIC 7 •  Position-independence imposes the following requirements on code: •  All branches must be PC-relative •  Function calls are done through entries in the Procedure Linkage Table (PLT) •  Note : mandatory for extern or pre-emptible functions •  Code that forms an absolute address referring to any address in a DSO text segment is forbidden : it would have to be relocated at load-time, making it non- sharable •  All references to the data segment and to constants and literals in the text segment must be PC-relative •  Distance between text and data segments in a module can be constant •  Code that references symbols that are or may be imported from other load modules must use indirect addressing through a table, the Global Offset Table (GOT) •  Indirect load and stores must be used for items that may be dynamically bound. In both cases the indirection is done through the GOT allocated by the linker and initialized by the dynamic loader PIC code requirements
  • 8. Definitions : Dynamic Loader/Linker 8 •  The dynamic loader/linker is a component of the system that: •  locates all load modules belonging to an application •  loads them into memory •  binds the symbolic references amongst them by resolving ‘relocations’ From kernel to running application Kernel loads: Application Dynamic loader/ linker : ld-uClibc.so Dynamic loader: Application libc.so libfoo.so ld-uClibc.so Dynamic linker: Application • Patch data libc.so • Patch data libfoo.so • Patch data ld-uClibc.so Control to main startup Application libc.so libfoo.so ld-uClibc.so
  • 9. Definitions : Pre-emption and visibility 9 •  Because of symbol pre-emption, the final linkage value of a global symbol might not be known until run time •  Thus, defined symbols that are considered bound at link time must have the appropriate visibility attributes set: either protected or hidden. Compiler and assembler define attributes to set these properties. This enables code generation optimization. ELF visibility attributes Symbol Attribute Visible Pre-emptible “default” : locals No No “default” : globals, weaks Yes Yes “protected” Yes No “hidden” No No “internal” (processor specific) No No
  • 11. FDPIC ABI 11 •  Absence of MMU means: •  Processes share a common address space : ‘text’ and ‘data’ relative positions are arbitrary. Data segment address cannot be rematerialized with a PC-relative offset. •  Need a mechanism for executable code to find its matching data: •  A dedicated register, the FDPIC register, always holds the start address of the data segment of the current load module. •  Non-constant offset between code and data also complicates function pointer representation •  A function pointer is represented as the address of a Function Descriptor composed of two words: •  The address of the actual code to execute •  The corresponding data segment address (to be loaded into the FDPIC register) MMU-less environment PIC constraints
  • 12. FDPIC ABI 12 •  As both code and data must be relocatable, executable code shall not contain any instruction containing a pointer value •  Pointers to global data are indirectly referenced by the Global Offset Table (GOT) •  At load-time, pointers contained in the GOT are relocated by the dynamic linker to point to their actual location Pure PIC constraints
  • 13. FDPIC ABI 13 •  The FDPIC register : •  is used as a base register for accessing the Global Offset Table (GOT) and the Function Descriptor Table (FUNCDESC) of a given load module. The GOT and FUNCDESC tables are usually mapped at the start of the data segment •  is defined to be R9 •  is treated as caller-saved register (as opposed to callee-saved) •  R9 is considered to be platform-specific in the ARM ABI •  A caller function must save R9 before doing the actual call either on the stack or in one of the callee-saved registers: R4-R8, R10, R11, R13 •  A caller must restore R9 after the call, •  R9 in == R9 out The FDPIC register
  • 14. FDPIC ABI 14 •  Upon entry to a function, the FDPIC register (R9) contains the GOT address of the load module, where the function sits •  The FDPIC register can be materialized in three ways: 1.  by being inherited from the calling function in the case of a direct call to a function within the same ‘load module’ 2.  by being set either in a PLT entry or in inlined PLT code 3.  by being set from a function descriptor as part of an indirect function call •  A Procedure Linkage Table entry contains the code to establish the new R9 value, retrieve the address of the actual code and branch to it •  The function alone would have no means to materialize its own FDPIC value. FDPIC register materialization
  • 15. FDPIC ABI 15 •  Making direct calls to functions external to a given ‘load module’ requires a PLT entry (thunking) •  This inter-linkage code establishes the FDPIC value of the targeted function •  It would not be possible in the prologue of the function •  The PLT entry contains 1.  instructions for setting the FDPIC register, fetching the function’s start address and branching to it 2.  the offset of a function descriptor associated with the function •  The function descriptor is located in the caller’s FUNCDESC table at a fixed offset from the address specified by the FDPIC register of the caller Procedure Linkage Table a.k.a. PLT
  • 16. FDPIC ABI 16 •  A PLT entry may look like this (ip is the intra-procedure call scratch register) : •  A call to a function will look like this •  A PLT entry for foo is part of .plt sections of all modules calling foo. A function descriptor for foo is allocated in FDTs of all foo callers . Procedure Linkage Table entry code plt(foo): ldr ip, .L1 # foo’s descriptor offset add ip, ip, r9 # from caller’s FDT ldr r9, [ip, #4] # foo’s data address ldr pc, [ip] # foo’s code address L1. .word foo(GOTOFFFUNCDESC) bl foo(PLT)
  • 17. Use of PLT and FUNCDESC (global) 17 LM1 text segment data segment r9 bl foo(PLT) pc Calling foo in another load module foo @ foo’s GP LM2 text segment GOT GOT PLT’s foo foo’s code foo(GOTOFFFUNCDESC) FUNCDESC FUNCDESC pc’ r9’
  • 18. FDPIC ABI 18 •  The GOT contains words that hold the location of global data •  To access global data: •  code must use an FDPIC-relative load to fetch the data address from the GOT •  the data item is then accessed using the address obtained from the GOT •  The link editor is responsible for determining the precise GOT layout •  The GOT initialization is performed at load time by the dynamic linker •  Actually the dynamic linker resolves relocations Global Offset Table a.k.a. GOT
  • 19. Reading global data 19 •  Usual FDPIC sequence: Data foo is actually global ldr r2, .L3 # Load GOT offset ldr r3, [r9, r2] # Add FDPIC register and load foo’s @ ldr r3, [r3, #0] # Read foo .L4 .align 2 .L3 .word foo(GOT) •  The static linker will allocate memory into GOT area to hold foo’s address in all modules that access foo. •  It will replace foo(GOT) by an offset between the start of GOT and the GOT entry that contains foo’s address
  • 20. Use of GOT for global access 20 _GLOBAL_OFFSET_TABLE_ GOT @foo LM text segment data segment r9 foo’s address offset foo(GOT) pc foo Data item foo is in LM or in another load module
  • 21. Reading global data 21 •  Usual FDPIC sequence: bar is actually in the load module (static, protected, hidden, internal) ldr r3, .L3 # Load bar offset from b/o GOT add r3, r3, r9 # Add FDPIC register to create bar’s @ ldr r3, [r3, #0] # Read bar .L4 .align 2 .L3 .word bar(GOTOFF) •  The static linker will allocate memory into GOT area to hold bar’s address •  It will replace bar(GOTOFF) by the offset between start of GOT and bar’s address
  • 22. Use of R9 for local access 22 GOT text segment data segment r9 bar’s offset from PC pc bar bar’s offset from GP bar(GOTOFF) Data item bar is in the data segment of the load module
  • 23. Taking the address of a global function 23 •  FDPIC sequence: • The static linker will replace GOTFUNCDESC with the offset between GOT and the entry that contains function descriptor address. Function foo is actually global ldr r3, .L3 # Load GOT offset ldr r3, [r3, r9] # Read foo’s function descriptor @ : # FDPIC register + offset .L4 .align 2 .L3 .word foo(GOTFUNCDESC)
  • 24. Use of GOT and FUNCDESC (global) 24 LM1 text segment data segment r9 foo’s descr. offset pc function foo is in another load module foo’s desc. @ foo @ foo’s GP LM2 text segment GOT GOT PLT’s foo foo’s code foo(GOTFUNCDESC) FUNCDESC
  • 25. Taking the address of a local function 25 •  Usual FDPIC sequence: •  The static linker will allocate a descriptor for bar function and replace GOTOFFFUNCDESC by the offset between the GOT address and the address of bar’s function descriptor function bar is actually local (static, protected, hidden, internal) ldr r0, .L3 # Load bar’s descriptor entry offset .LPIC0: add r0, r0, r9 # Create bar’s FUNCDESC address # from FDPIC register .L4 .align 2 .L3 .word bar(GOTOFFFUNCDESC)
  • 26. Use of GOT and FUNCDESC (local) 26 text segment data segment r9 bar’s descriptor offset bar(GOTOFFFUNCDESC) pc bar is local and its function descriptor is in the FUNCDESC table FUNCDESC bar’s desc. @ GOT bar@ bar’s GP bar’s code
  • 27. Direct jumping into a function •  Usual PIC sequence: 27 •  FDPIC sequence: bl myFunction push r9 bl myFunction # may go thru PLT pop r9
  • 28. Indirect jumping : function given as a parameter •  PIC Usual sequence ( r0 : function @ ): 28 •  FDPIC sequence (r0 : function descriptor @) : … blx r0 … … ldr r3, [r0, #0] # Read function @ mov r4, r9 # Save FDPIC reg ldr r9, [r0, #4] # Read module # FDPIC value blx r3 # Call mov r9, r4 # Restore FDPIC reg …
  • 29. FDPIC ABI 29 Same parameter passing conventions but calling conventions are different and incompatible: •  the GOT address must be materialized in r9 upon function entry •  Mechanisms used for accessing global data are different and incompatible •  Function pointers are represented as 2-word function descriptors •  ‘text’ and ‘data’ segments may be relocated by different amounts •  The ARM FDPIC ABI adds specific relocations Summary of differences vs. ARM PIC ABI
  • 30. ARM Linux FDPIC Toolchains for ARM mmu-less Cortex-M and -R cores Credit: Mickael Guêné (ST ARM compilation team)
  • 31. ARM FDPIC Toolchain •  FDPIC toolchains exist for Blackfin, FR-V, Kalray processors •  ST has implemented the FDPIC ABI in GNU toolchains for ARM v7- R (CR4), ARM-v7-M (CM4) in Thumb and ARM modes. •  It generates code suitable for a Linux MMU-less execution environment •  It supports code generation for: •  Kernel compilation •  Shared libraries and dynamically linked binaries •  Fully static binaries •  It gives access to Linux functionality thru µclibc library, w/ multi-threading (NPTL, TLS), but limited fork() due to µclibc. •  It supports debugging of dynamically linked application with shared libs •  It supports userland emulation with qemu, proot and ARMv7 R and M Linux rootfs •  ST implemented the few kernel patches, which allow the kernel to load and correctly handle FDPIC binaries and DSOs. 31 At a glance
  • 32. FDPIC ABI Toolchain Support 32 •  The compiler is configured by default to generate FDPIC code •  The –mfdpic option is implicit in the compiler configuration •  The __FDPIC__ cpp macro is defined •  Note that the __PIC__ macro is also defined since PIC code is generated •  No change required in Linux applications build systems vs. FLAT or BFLAT •  The –mno-fdpic option can be used to disable the FDPIC code generation •  This is mandatory to build the Linux kernel •  It is possible to mix static and dynamic libraries and link your applications with them •  Use –Wl,-Bstatic –lfoo –Wl,-Bdynamic -lbar in the linker command •  It is possible to generate FDPIC static applications •  Use –static in the linker command •  All libraries can be linked in the final application •  Such a static binary can be also loaded at any arbitrary address •  LAZY binding is disabled, since it cannot be supported in a thread-safe way •  Race condition on non-atomic 64-bit load of function descriptor on ARM Driver options
  • 33. ARM FDPIC Toolchain •  ARM v7-R toolchain in production for Cortex R4 in an Automotive application •  ARM v7-M for Cortex M4 as mature as CR4 but not in production yet •  Both toolchains build a root file system with BuildRoot, which allows to boot Linux and run BusyBox and mplayer. •  proot, qemu and ARM v7-R and –M available rootfs images allow to execute an ARM Linux application on an Intel Linux workstation 33 Maturity
  • 34. ARM FDPIC Toolchain 34 Component Version Patches Kernel 3.10-3.12 •  Loader upgrade for FDPIC binaries •  ptrace extension (loadmap service + addresses of data and code segments) •  signal handlers (2 word address pointers) •  futex for MMU-less GCC 4.7.3 •  Large patch for FDPIC code generation •  4.8 prototype Binutils 2.22 •  New relocations in BFD •  Large generic patch in the linker GDB 7.5 •  server + client •  Handling of dynamic objects, FDPIC awareness µclibc library April 2012 •  generic patch in dynamic linker, •  start-up code, relocations, ARM architecture dependencies qemu 1.5 •  Same work in loader as in kernel Components
  • 36. What’s next? •  Source code is available on GitHub, not upstreamed yet: •  https://github.com/mickael-guene/fdpic_manifest •  For each component, two Git branches have been defined one for Cortex-R and one for Cortex-M •  Plan A, if community interest, we would have to follow these steps: •  Upstream kernel patches. They have not been submitted to kernel.org, just one bug entered on futex in mmu-less context •  Merge source modification to gcc, binutils, gdb, uclibc, qemu and parameterize machine dependencies (difficulty with binutils and uclibc) •  Merge binary toolchain to support both ARM Cortex-M and –R with multilib •  Investigate a replacement of uclibc with glibc or Musl C (http://www.musl-libc.org) •  Will it be possible to fix lazy procedure linkage support? •  Upgrade to newer versions of components for upstream •  Plan B: Share and maintain current sources on GitHub 36 Conclusion
  • 37. References 37 •  GNU binutils documentation •  http://www.gnu.org/software/binutils •  ARM ABI Specification •  http://infocenter.arm.com/help/topic/com.arm.doc.ihi0042e/IHI0042E_aapcs.pdf •  Relocations •  http://www.mindfruit.co.uk/2012/06/relocations-relocations.html •  DSOs engineering •  http://www.akkadia.org/drepper/dsohowto.pdf Further reading
  • 39. Reading global data 39 •  Usual PIC sequence: Data is actually global ldr r2, .L3 # Load offset from PC to GOT .LPIC0 add r2, pc, r2 # Build GOT address ldr r3, .L3+4 # Load offset of foo entry in the GOT ldr r3, [r2, r3] # Read foo’s address ldr r3, [r3] # Read foo’s value .L4 .align 2 .L3 .word _GLOBAL_OFFSET_TABLE_-(.LPIC0+8) .word foo(GOT)
  • 40. Reading global data 40 •  Usual PIC sequence: Data is actually in the load module (static, protected, hidden, internal) ldr r3, .L7 # Load bar offset from ins. below .LPIC1: add r3, pc, r3 # Load bar’s address ldr r3, [r3, #0] # Read bar .L4 .align 2 .L7 .word bar-(.LPIC1+8)
  • 41. Taking the address of a global function 41 •  Usual PIC sequence: Function foo is actually global ldr r3, .L3 # Load global table offset .LPIC0: add r3, pc, r3 # Compute GOT address ldr r2, .L3+4 # Load foo entry offset ldr r3, [r3, r2] # Read foo’s address .L4 .align 2 .L3 .word _GLOBAL_OFFSET_TABLE_-(.LPIC0+8) .word foo(GOT)
  • 42. Taking the address of a local function 42 •  Usual PIC sequence: function bar is actually local (static, protected, hidden, internal) ldr r0, .L3 # Read offset between here and bar @ .LPIC0: add r0, pc, r0 # Create bar’s address from PC .L4 .align 2 .L3 .word bar-(.LPIC0+8)