SlideShare a Scribd company logo
Together, we can make
difference
The Theory, History and Future
of
System Linkers
Luba Tang
CEO & Founder, Skymizer Inc.
Outline
• The History
– Target Independent Linkers
– Post Optimizers
– Instrumentation Tools
• The Theory
– Linking Language
– Fragment-reference graph
• The Future
– for GPGPU; for virtual machines
– The bold project
唐文力 Luba Tang
CEO & Founder of Skymizer Inc.
Architect of MCLinker and GYM compiler
Compiler and Linker/Electronic System Level Design
Linker: The Elephant in the Room
• System linkers are very complicated. Only a few team can make
a full-fledge system linker.
– There are only four open source linkers that can be said full-fledge.
• GNU ld, Google gold can link Linux kernel
• Apple ld64 can link Mac OS X and iOS
• MCLinker can link BSD and Android system
• ELF linkers are super complicated. There are many
undocumented behaviors and target-specific behaviors.
– The other linkers are developed for more than three years and can not
be released. The linking problem is intricate.
• Although a lot of researches have proven linker itself can
optimize programs at a high performance level, developers still
not get benefit from these researches.
3
No Linker Really Optimize Programs
• MCLinker is 35% faster than the Google gold, and the Google
gold is ~200% faster than GNU ld
• If we turn on optimization flags, the output quality is almost
identical to all linkers (<3 %)
Comparison of ELF Linkers
GNU ld Google gold MCLinker
License GPLv3
Cannot be adopted by Android
UIUC BSD-Style
Target Platform All Linux mainstream
devices
ARM, X86, X86_64,
(Mips, SPARC)
All Android devices.
ARM, X86, Mips
(X86_64, X32, Mips64 and
Hexagon)
Object Format COFF, a.out, ELF ELF only ELF, extensible
Line of Code 500+K 100+K 50+K
Performance - Fast Fastest
Steadily x2 than GNU ld, x1.3
than Google gold
Intermediate
Representation
The BFD library for
reference graph
None Command line language and
reference graph
5
The Most-Recently Important
Target Independent Linker Research
lcc link
1982
?
1982
應該超過半數聽眾
還沒出生
LINK: A Machine-Independent Linker
• Team
– Christopher W. Fraser
– David R. Hanson
• 1982, Software Practice and Experience
– Define linker and object language (the predecessor of
linker script)
– Define three basic rules
• Define the condition of resolution
• Define the condition of absolute objects
• Define when to pull in a library
lcc link
1982
Linker; Post Optimizer; Instrumentation
lcc link
1982
?
後面的人好像重點
開始歪掉
OM: Code Optimization at Link-Time System
• Team
– Amitabh Srivastava
– David W. Wall
• 1992 Technical Report
– An approach to transform binary into RTL
– Use RTL to do inter-procedural optimization (5%~14%,
SPEC)
• Dead code elimination
• Loop Invariant Code Motion (LICM)
• 1994 SIGPLAN (3.8%, SPEC)
– Replace load instruction and eliminate GAT
– Reduce code size by 10% or more
OM
1992, 1994
OM: Code Optimization at Link-Time System
• Key Contributions of OM are
– OM identifies the problems to translate binary
back to assembly.
• PC-relative branches only
• Convert jump table back to case-statement
• No delayed branch, no delay slot
OM
1992, 1994
退休 Ya!
Spike: A successor or a competitor of OM
• DEC Team
– Robert Cohn
– David W. Goodwin
– P. Geoffrey Lowney
• 1996 Micro 29 (They call themselves
another OM)
– Hot Code Optimization to use shorter jump
– Works on Windows/NT Digital Alpha 3~8%
improvement
Spike
1996, 1997
RC
ATOM: Analysis Tools with OM (Best of PLDI 1979-1999)
• Dream Team - 1999
– Amitabh Srivastava (President of EMC)
– Alan Eustace (Senior VP of Google Search)
ATOM
1999
ATOM: Analysis Tools with OM (Best of PLDI 1979-1999)
• Key Contributions of ATOM are
– ATOM defines the use scenario and APIs of an
instrumentation tool
– Intel Pin follows APIs of ATOM.
• The rest contributions:
– Reducing procedure call overhead (caller-save and
callee-save)
– Use virtual machine to instrument program
• Defines the necessary memory layout
Chronicle of Linker Optimization
OM
1992, 1994
Spike
1996, 1997
RC
ATOM
1999
Alto
1999
ICFG
2000, 2001, 2002 Diablo
2003, 2005, 2007
Bruno
De
BUS
Pin
2005, 2007, 2011
RC
Alto: A Link-Time Optimizer for the Compaq
Alpha
• Team
– Robert Muth
– Saumya Debray
– Scott Watterson
– Keo De Bosschere
• Convert binary into control flow graph
– General approach
– The inspirer of ICFG
Alto
1999
Alto: A Link-Time Optimizer for the Compaq
Alpha
• Powerful Analysis and Optimization
– Simplification
• Dead code elimination
• Normalize operations who express the same semantics
• Use nops instead of remove instructions directly
– Analysis
• Machine level idioms for control transfer
• Live analysis (register level)
– Optimization
• Constant propagation (remove load, 6.4%)
• Dead code elimination
• Unused memory elimination (remove load, speed up 5.7%)
• Low level inlining (10% on average)
• Profile-directed code layout (6.5%)
• Instruction scheduling
ICFG: Interprocedural Control Flow Graph
• Team
– Saumya Debray
– William Evans
– Robert Muth
– Daniel Kastner
– Bjorn De Sutter
– Koen De Bosschere
• ACM Trans. on Programming Languages and
Systems, 2000
– Defines ICFG
– Collect compiler techniques for code compaction
– Reduce 30% on the average
ICFG
2000, 2001, 2002
Diablo: Post-Pass Optimization
• Team, Collection of Euro
– Bruno De Bus
– Saumya Debray
– William Evans
– Robert Muth
– Daniel Kastner
– Ludo Van Put
– Bjorn De Sutter
– Koen De Bosschere
• First complete post-pass optimizer
– A lot of following researches
Diablo
2002 - 2007
Bruno
De
BUS
Diablo: Post-Pass Optimization
• For code size, C++ have more opportunity than
C
– Sifting out the Mud: Low Level C++ Code Reuse,
OOPSLA’02
• Reduce 27~70%, 43% on average
– Combining Global Code and Data Compaction,
LCTES’01
• Reduce 23.6%~46.6%; 8% faster
• CFG reconstruction becomes mature
– Generic Control Flow reconstruction from Assembly
Code, LCTES’02
– Can handle delay slots and restricted indirection
Pin: Building Customized Program Analysis
Tools with Dynamic Instrumentation
• Team, Collection of USA, Intel
– Chi-Keung Luk
– Robert Cohn
– Robert Muth
– Harish Patil
– Artur Klauser
– Geoff Lowney
– Steven Wallace
– Vijay Janapa Reddi
– Kim Hazelwood
• Pin release the power of program analysis
– 1608 citation since 2005
– Heavily cited in GPGPU and HSA area
Pin
2005, 2007, 2011
RC
Pin: Building Customized Program Analysis
Tools with Dynamic Instrumentation
State-of-Art instrumentation tool
Pin Provides ATOM-like APIs
• User can write his own instrument and analysis code
Linker: The Elephant in the Room
• Although a lot of researches have proven
linker itself can optimize programs at a high
performance level, developers still not get
benefit from these researches.
24
Outline
• The History
– Target Independent Linkers
– Post Optimizers
– Instrumentation Tools
• The Theory
– Linking Language
– Fragment-reference graph
• The Future
– for GPGPU; for virtual machines
– The bold project
唐文力 Luba Tang
CEO & Founder of Skymizer Inc.
Architect of MCLinker and GYM compiler
Compiler and Linker/Electronic System Level Design
Introduction to Linker Intermediate Representation
• MCLinker is the first *ELF linker to provide an intermediate
representation (IR) for efficient transformation and analysis
• MCLinker provides IR on two levels
– Linker Command Line Language
– Fragment-Reference Graph
• Fragment is the basic linking unit, it can be
– A section (coarse granularity)
– A block of code or instructions (middle granularity)
– An individual symbol and its code/data (fine granularity)
• MCLinker can trade linking time for the output quality.
– The finer granularity,
• Fast, smaller program
• Longer link time
* Nick Kledzik invents the Atom IR in ld64 for MachO. ld64 inspires MCLinker IRs
The Linker Command Line Language
• Linker’s command line options is a kind of language
– The meaning of a option depends on
• their positions
• the other potions
– Some options have its own grammar
▪ Four categories of the options
– Input files
– Attributes of the input files
– Linker script options
– General options
▪ Examples
ld /tmp/xxx.o –lpthread
ld –as-needed ./yyy.so
ld –defsym=cgo13=0x224
ld –L/opt/lib –T ./my.x
The GNU ld Linker
• The GNU ld linker is an interpreter of the
command line language
– Processing is recursive.
– No clear separation between individual steps
– Binary File Descriptor (BFD) is the only IR
The Google gold Linker
• The Google gold linker separates linking into two stages
– Symbol resolution
– Relocation of instructions and data
• Although it has separated the linking processes, it does not provide reusable
IR for optimization and analysis
• The Google gold linker illustrates an efficient linking algorithm
– It’s x2 faster than the GNU ld linker
– Support multiple threads. Appropriate to cloud computing
MCLinker
• MCLinker separates the linking into four distinct stages
– Normalization – parse the command line language
– Resolution – resolve symbols
– Layout – relocate instructions and data
– Emission – emit file by various formats
• MCLinker provides two level intermediate representation (IR)
– The command line language level
– The reference graph level
Input Files on The Command Line
• An input file can be an object file, an archive, or a linker
script
• Some input files can be defined multiple times
• The result of linking depends on the positions of inputs
on the command line.
– Weak symbols are first-come-first-served
– COMDAT sections are first-come-first-served
• Two semantics to read input files
– INPUT( file1, file2, file3, ...)
– GROUP( archive1, archive2, archive3, ...)
• Archives in a group are searched repeatedly until no new
undefined references are created
$ ld a.o –start-group b.a c.a –end-group d.o e.o
The Input File Tree
• We can represent the input files on
the command line by a tree structure
– Vertices describes input files and
groups on the command line
• Object files
• Archives
• Linker scripts
• Entrances of groups
• Edges describe the relationships between
vertices
– Positional edges
– Inclusive edges
• Linkers resolve symbols by DFS and merge sections by BFS
• Example
$ ld a.o –start-group b.a c.a –end-group d.o e.o
Attributes of Input Files
• Attributes change the way that a linker handles the input files
• Attributes affect the input files after the attribute options
Functions Options Meanings
Whole archives --whole-archive Includes every file in the archive
Link against dynamic
libraries
-Bdynamic Search shared libraries for -l option
As needed --as-needed Only add the necessary shared libraries to
resolve symbols
Input format --format= The format of the following input files
Attributes in The Input File Tree
• Every input has a set of attributes
• In the MCLinker implementation,
we give every vertex a reference
to its attribute set
• If two vertices have identical
attributes, they can share a
common attribute set.
• Example
$ld ./a.o --whole-archive
--start-group ./b.a
./c.a --end-group
--no-whole-archive
./d.o ./e.o
Normalization
• Transform the command line language into the
input file tree
– Parse command line options
– Recognize input files to build up sub-trees
– Merge all sub-trees to a form the input file tree
Steps of Normalization
• Step of normalization
1. Parse the command line options
2. Recognize archives and linker
scripts
3. Read the linker scripts and
archives to create sub-trees
4. Merge all sub-trees
• Example
$ ld ./a.o ./b.a ./c.o
Traverse the Input File Tree
• MCLinker provides different iterators for different
purposes
– For symbol resolution
• Depth first search for correctness
– For section merging
• Breadth first search for cache locality of the output file
Resolution
• Transform the input file tree into the reference
graph
– Resolves symbols
– Reads relocation
– Builds the reference graph
Symbols and Relocations
• A fragment is a block of instruction code or data in a module
– A fragment may be
• a function,
• a label (Basic block),
• a 32-bit integer data, and so on.
• A defined symbol indicates a fragment
• A relocation represents an use-define relationship between two
fragments
define @bar()
…
add @a, 0x1, 0x2
…
@a = global i32 0
…
Module X Module Y
relocation
usedefine
Symbol
@a
Symbol
@bar
Fragment-Reference Graph (1/2)
• A reference is a symbolic linkage between two
fragments
– A reference is an directed edge from use to define
• MCLinker represents the input modules as a graph structure
– Vertices describe the fragments of modules
– Edges describe the references between two fragments
relocation
usedefine
symboldefine fragment use fragment
a reference
Fragment-Reference Graph (2/2)
• A Fragment-Reference Graph is a digraph, FRG = (V, E, S, O)
– V is a set of fragments
– E is a set of references, from use to define
– S is a set of define symbols. They are the entrances of the graph
– O is a set of exits and explains later.
__start__global
fragment
edge
Symbol Resolution
• Determine the topology of the reference graph
– Relocation is a plug
– Define symbol is a slot
– Symbol resolution connects plugs and slots.
• Symbols has a set of attributes to help linkers determine the correct
topology
relocation
use
define
symbol define fragment
use fragment Undefine symbol
define
symbol define fragment
define
Which
one?
Optimizations on the Fragment-Reference Graph
• Fragment stripping
– Remove unused fragment for shrink code size (Reachability
problem)
– Traditional linkers strip coarse sections. But MCLinker can
strips finer-grained fragments.
– The finer granularity, the smaller code size
• Branch optimization
– Replace high cost branch by low cost branch
– Optimizing by change of the relocation type
• Low-level inlining - ICF
• Fragment duplication for TLS optimization and copy
relocations
Layout
• To serialize the reference graph into a address
space
– Scan relocations
– Layout
– Apply relocations
Exits of The Fragment-Reference Graph
• A Fragment-Reference Graph is a digraph, FRG = (V, E, S, O)
– O is a set of exits. An exit represents a dynamic relocation to GOT.
– Represent to access external variables or to call an external function exits the FRG
• If the defining fragment is in an external module, then MCLinker will add exits
for the references to the outside module.
– We have no way to know the memory address of the external module until the load time
– We add the Global Offset Table (GOT) for the unknown addresses
– We add dynamic relocations for all entries of the GOT
– Loader will apply the dynamic relocations and set the correct address in the GOT.
– The program use the GOT to accesses the external module indirectly
__start
GOT
relocation
use define
relocation
exit
Layout
• Layout is a process to finalize the address of fragment and symbols
– Sorts FRG=(V, E, S, O) topologically
– Assigns addresses to {V, S, O}
• Before layout, we must calculate the sizes of all elements of the graph
– Relocation scanning
• Reserve exits and calculate the sizes of all exits
• Undefined global symbol, GOT, and dynamic relocations
– *Pre-layout
• Calculate the size of all fragments
• Calculate the size of all entrances
– Global symbols and the hash table
* MCLinker follows the Google gold linker’s naming. But pre-layout is opaque and may be renamed.
Apply relocation (1/2)
• Adjusts the content of using fragments
– Final addresses of symbol is known after layout
– Correct use fragment by accessed address
add @a , 0x1, 0x2
…
0x24 @a
…
Symbol Table Module Y
relocation
usedefine
0x24
Apply relocation (2/2)
• Replaces absolute addresses by PC-related offset if supported by the target
• Basic Relocation Formula
S – P + A
– S: the symbol value
– P: the place of the use instruction
– A: addend, adjustment (by the instruction format)
…
@a
…
add @a , 0x1, 0x2
S
P
S - P
A
address space
Optimizations on Layout
• Dynamic Prelinking
– If the system puts shared libraries at a fixed memory
location, we can fill GOT with fixed addresses to avoid
symbol look up in the loader
• Static Prelinking
– If the system puts shared libraries at a fixed memory
location, we can directly refer to the fixed addresses
without any exits
• Symbol Stripping
– Strip the undefined symbols which is not a exit
• Sections/functions/basic block Reordering
– Linker knows the address and can perform better
reordering
Emission
• Emits the module in the output formats
– Adds format information
– Writes down the IR
• In order to improve both cache and page locality, MCLinker
collects and performs most file operations in this stage.
– MCLinker copies the content in the inputs and applies the resolved
reference in this stage.
Outline
• The History
– Target Independent Linkers
– Post Optimizers
– Instrumentation Tools
• The Theory
– Linking Language
– Fragment-reference graph
• The Future
– for GPGPU; for virtual machines
– The bold project
唐文力 Luba Tang
CEO & Founder of Skymizer Inc.
Architect of MCLinker and GYM compiler
Compiler and Linker/Electronic System Level Design
The bold Project
Challenge: Unified Shared Memory of
Heterogeneous Many-Core System
• Installation time compilation
– GPGPU languages (OpenCL, CUDA, RenderScript)
– Virtual Machine (Dalvik, RenderScript)
• Heterogeneous Many-core System
– Universal ELF
Unified Loader
Modular Linker
GCC LLVM Dalvik RenderScript
ARM HSA GPU DSP
OpenCL
The bold Project
• BSD licensing linker
– General purpose linker/loader
– Focus on optimization
– Linking in parallel
• OA (Owner agreement) and CA (Committer
agreement)
– Avoid interest confliction between industry and
community.
– Legal person can not be an owner
Fortune favors the bold

More Related Content

What's hot

IRATI @ RINA Workshop 2014, Dublin
IRATI @ RINA Workshop 2014, DublinIRATI @ RINA Workshop 2014, Dublin
IRATI @ RINA Workshop 2014, Dublin
Eleni Trouva
 
Irati goals and achievements - 3rd RINA Workshop
Irati goals and achievements - 3rd RINA WorkshopIrati goals and achievements - 3rd RINA Workshop
Irati goals and achievements - 3rd RINA Workshop
Eleni Trouva
 
Introduction to course
Introduction to courseIntroduction to course
Introduction to course
nikit meshram
 
A Common Backend for Hardware Acceleration of DSLs on FPGA
A Common Backend for Hardware Acceleration of DSLs on FPGAA Common Backend for Hardware Acceleration of DSLs on FPGA
A Common Backend for Hardware Acceleration of DSLs on FPGA
NECST Lab @ Politecnico di Milano
 
[Dec./2017] My Personal/Professional Journey after Graduate Univ.
[Dec./2017] My Personal/Professional Journey after Graduate Univ.[Dec./2017] My Personal/Professional Journey after Graduate Univ.
[Dec./2017] My Personal/Professional Journey after Graduate Univ.
Hayoung Yoon
 
Experimental evaluation of a RINA prototype - GC 2014
Experimental evaluation of a RINA prototype - GC 2014Experimental evaluation of a RINA prototype - GC 2014
Experimental evaluation of a RINA prototype - GC 2014
Eleni Trouva
 
2020 osi 7 layers for grade12
2020 osi 7 layers for grade122020 osi 7 layers for grade12
2020 osi 7 layers for grade12
Osama Ghandour Geris
 
Syntax directed-translation
Syntax directed-translationSyntax directed-translation
Syntax directed-translation
Junaid Lodhi
 
Lecture1 introduction compilers
Lecture1 introduction compilersLecture1 introduction compilers
Lecture1 introduction compilers
Mahesh Kumar Chelimilla
 
Dublin mngmt140120
Dublin mngmt140120Dublin mngmt140120
Dublin mngmt140120
ICT PRISTINE
 
Compiler Design Lecture Notes
Compiler Design Lecture NotesCompiler Design Lecture Notes
Compiler Design Lecture Notes
FellowBuddy.com
 
AADL Overview: Brief and Pointless
AADL Overview: Brief and PointlessAADL Overview: Brief and Pointless
AADL Overview: Brief and Pointless
Ivan Ruchkin
 
Safe and Reliable Embedded Linux Programming: How to Get There
Safe and Reliable Embedded Linux Programming: How to Get ThereSafe and Reliable Embedded Linux Programming: How to Get There
Safe and Reliable Embedded Linux Programming: How to Get There
AdaCore
 

What's hot (14)

IRATI @ RINA Workshop 2014, Dublin
IRATI @ RINA Workshop 2014, DublinIRATI @ RINA Workshop 2014, Dublin
IRATI @ RINA Workshop 2014, Dublin
 
Irati goals and achievements - 3rd RINA Workshop
Irati goals and achievements - 3rd RINA WorkshopIrati goals and achievements - 3rd RINA Workshop
Irati goals and achievements - 3rd RINA Workshop
 
Introduction to course
Introduction to courseIntroduction to course
Introduction to course
 
DhevendranResume
DhevendranResumeDhevendranResume
DhevendranResume
 
A Common Backend for Hardware Acceleration of DSLs on FPGA
A Common Backend for Hardware Acceleration of DSLs on FPGAA Common Backend for Hardware Acceleration of DSLs on FPGA
A Common Backend for Hardware Acceleration of DSLs on FPGA
 
[Dec./2017] My Personal/Professional Journey after Graduate Univ.
[Dec./2017] My Personal/Professional Journey after Graduate Univ.[Dec./2017] My Personal/Professional Journey after Graduate Univ.
[Dec./2017] My Personal/Professional Journey after Graduate Univ.
 
Experimental evaluation of a RINA prototype - GC 2014
Experimental evaluation of a RINA prototype - GC 2014Experimental evaluation of a RINA prototype - GC 2014
Experimental evaluation of a RINA prototype - GC 2014
 
2020 osi 7 layers for grade12
2020 osi 7 layers for grade122020 osi 7 layers for grade12
2020 osi 7 layers for grade12
 
Syntax directed-translation
Syntax directed-translationSyntax directed-translation
Syntax directed-translation
 
Lecture1 introduction compilers
Lecture1 introduction compilersLecture1 introduction compilers
Lecture1 introduction compilers
 
Dublin mngmt140120
Dublin mngmt140120Dublin mngmt140120
Dublin mngmt140120
 
Compiler Design Lecture Notes
Compiler Design Lecture NotesCompiler Design Lecture Notes
Compiler Design Lecture Notes
 
AADL Overview: Brief and Pointless
AADL Overview: Brief and PointlessAADL Overview: Brief and Pointless
AADL Overview: Brief and Pointless
 
Safe and Reliable Embedded Linux Programming: How to Get There
Safe and Reliable Embedded Linux Programming: How to Get ThereSafe and Reliable Embedded Linux Programming: How to Get There
Safe and Reliable Embedded Linux Programming: How to Get There
 

Similar to 2013 Hello GCC:The Theory, History and Future of System Linkers

Ml also helps generic compiler ?
Ml also helps generic compiler ?Ml also helps generic compiler ?
Ml also helps generic compiler ?
Ryo Takahashi
 
COMMitMDE'18: Eclipse Hawk: model repository querying as a service
COMMitMDE'18: Eclipse Hawk: model repository querying as a serviceCOMMitMDE'18: Eclipse Hawk: model repository querying as a service
COMMitMDE'18: Eclipse Hawk: model repository querying as a service
Antonio García-Domínguez
 
01 intro-bps-2011
01 intro-bps-201101 intro-bps-2011
01 intro-bps-2011
mistercteam
 
EuroMPI 2013 presentation: McMPI
EuroMPI 2013 presentation: McMPIEuroMPI 2013 presentation: McMPI
EuroMPI 2013 presentation: McMPI
Dan Holmes
 
OliverStoneSWResume2015-05
OliverStoneSWResume2015-05OliverStoneSWResume2015-05
OliverStoneSWResume2015-05Oliver Stone
 
Onnc intro
Onnc introOnnc intro
Onnc intro
Luba Tang
 
Maniteja_Professional_Resume
Maniteja_Professional_ResumeManiteja_Professional_Resume
Maniteja_Professional_ResumeVaddi Maniteja
 
TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...
TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...
TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...chiportal
 
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese..."Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
Edge AI and Vision Alliance
 
Network architecure (3).pptx
Network architecure (3).pptxNetwork architecure (3).pptx
Network architecure (3).pptx
Kaythry P
 
Ice Age melting down: Intel features considered usefull!
Ice Age melting down: Intel features considered usefull!Ice Age melting down: Intel features considered usefull!
Ice Age melting down: Intel features considered usefull!
Peter Hlavaty
 
ElixirでFPGAを設計する
ElixirでFPGAを設計するElixirでFPGAを設計する
ElixirでFPGAを設計する
Hideki Takase
 
Computing Without Computers - Oct08
Computing Without Computers - Oct08Computing Without Computers - Oct08
Computing Without Computers - Oct08
Ian Page
 
Gnu linux for safety related systems
Gnu linux for safety related systemsGnu linux for safety related systems
Gnu linux for safety related systemsDTQ4
 
infiniband.pdf
infiniband.pdfinfiniband.pdf
infiniband.pdf
AkashSuresh45
 
2015 bioinformatics python_introduction_wim_vancriekinge_vfinal
2015 bioinformatics python_introduction_wim_vancriekinge_vfinal2015 bioinformatics python_introduction_wim_vancriekinge_vfinal
2015 bioinformatics python_introduction_wim_vancriekinge_vfinal
Prof. Wim Van Criekinge
 
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
Jose Quesada (hiring)
 
System on Chip Design and Modelling Dr. David J Greaves
System on Chip Design and Modelling   Dr. David J GreavesSystem on Chip Design and Modelling   Dr. David J Greaves
System on Chip Design and Modelling Dr. David J Greaves
Satya Harish
 
Alessandro Abbruzzetti - Kernal64
Alessandro Abbruzzetti - Kernal64Alessandro Abbruzzetti - Kernal64
Alessandro Abbruzzetti - Kernal64
Scala Italy
 

Similar to 2013 Hello GCC:The Theory, History and Future of System Linkers (20)

Ml also helps generic compiler ?
Ml also helps generic compiler ?Ml also helps generic compiler ?
Ml also helps generic compiler ?
 
COMMitMDE'18: Eclipse Hawk: model repository querying as a service
COMMitMDE'18: Eclipse Hawk: model repository querying as a serviceCOMMitMDE'18: Eclipse Hawk: model repository querying as a service
COMMitMDE'18: Eclipse Hawk: model repository querying as a service
 
Introduction to multicore .ppt
Introduction to multicore .pptIntroduction to multicore .ppt
Introduction to multicore .ppt
 
01 intro-bps-2011
01 intro-bps-201101 intro-bps-2011
01 intro-bps-2011
 
EuroMPI 2013 presentation: McMPI
EuroMPI 2013 presentation: McMPIEuroMPI 2013 presentation: McMPI
EuroMPI 2013 presentation: McMPI
 
OliverStoneSWResume2015-05
OliverStoneSWResume2015-05OliverStoneSWResume2015-05
OliverStoneSWResume2015-05
 
Onnc intro
Onnc introOnnc intro
Onnc intro
 
Maniteja_Professional_Resume
Maniteja_Professional_ResumeManiteja_Professional_Resume
Maniteja_Professional_Resume
 
TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...
TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...
TRACK F: OpenCL for ALTERA FPGAs, Accelerating performance and design product...
 
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese..."Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
 
Network architecure (3).pptx
Network architecure (3).pptxNetwork architecure (3).pptx
Network architecure (3).pptx
 
Ice Age melting down: Intel features considered usefull!
Ice Age melting down: Intel features considered usefull!Ice Age melting down: Intel features considered usefull!
Ice Age melting down: Intel features considered usefull!
 
ElixirでFPGAを設計する
ElixirでFPGAを設計するElixirでFPGAを設計する
ElixirでFPGAを設計する
 
Computing Without Computers - Oct08
Computing Without Computers - Oct08Computing Without Computers - Oct08
Computing Without Computers - Oct08
 
Gnu linux for safety related systems
Gnu linux for safety related systemsGnu linux for safety related systems
Gnu linux for safety related systems
 
infiniband.pdf
infiniband.pdfinfiniband.pdf
infiniband.pdf
 
2015 bioinformatics python_introduction_wim_vancriekinge_vfinal
2015 bioinformatics python_introduction_wim_vancriekinge_vfinal2015 bioinformatics python_introduction_wim_vancriekinge_vfinal
2015 bioinformatics python_introduction_wim_vancriekinge_vfinal
 
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
 
System on Chip Design and Modelling Dr. David J Greaves
System on Chip Design and Modelling   Dr. David J GreavesSystem on Chip Design and Modelling   Dr. David J Greaves
System on Chip Design and Modelling Dr. David J Greaves
 
Alessandro Abbruzzetti - Kernal64
Alessandro Abbruzzetti - Kernal64Alessandro Abbruzzetti - Kernal64
Alessandro Abbruzzetti - Kernal64
 

Recently uploaded

MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
Osamah Alsalih
 
weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdf
Pratik Pawar
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
AJAYKUMARPUND1
 
HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generation
Robbie Edward Sayers
 
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
MdTanvirMahtab2
 
Railway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdfRailway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdf
TeeVichai
 
AP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specificAP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specific
BrazilAccount1
 
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Sreedhar Chowdam
 
Cosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdfCosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdf
Kamal Acharya
 
CME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional ElectiveCME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional Elective
karthi keyan
 
The role of big data in decision making.
The role of big data in decision making.The role of big data in decision making.
The role of big data in decision making.
ankuprajapati0525
 
Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
Massimo Talia
 
space technology lecture notes on satellite
space technology lecture notes on satellitespace technology lecture notes on satellite
space technology lecture notes on satellite
ongomchris
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
fxintegritypublishin
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Teleport Manpower Consultant
 
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Dr.Costas Sachpazis
 
WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234
AafreenAbuthahir2
 
Architectural Portfolio Sean Lockwood
Architectural Portfolio Sean LockwoodArchitectural Portfolio Sean Lockwood
Architectural Portfolio Sean Lockwood
seandesed
 
power quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptxpower quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptx
ViniHema
 
Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
Kamal Acharya
 

Recently uploaded (20)

MCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdfMCQ Soil mechanics questions (Soil shear strength).pdf
MCQ Soil mechanics questions (Soil shear strength).pdf
 
weather web application report.pdf
weather web application report.pdfweather web application report.pdf
weather web application report.pdf
 
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
Pile Foundation by Venkatesh Taduvai (Sub Geotechnical Engineering II)-conver...
 
HYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generationHYDROPOWER - Hydroelectric power generation
HYDROPOWER - Hydroelectric power generation
 
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)
 
Railway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdfRailway Signalling Principles Edition 3.pdf
Railway Signalling Principles Edition 3.pdf
 
AP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specificAP LAB PPT.pdf ap lab ppt no title specific
AP LAB PPT.pdf ap lab ppt no title specific
 
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&BDesign and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
 
Cosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdfCosmetic shop management system project report.pdf
Cosmetic shop management system project report.pdf
 
CME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional ElectiveCME397 Surface Engineering- Professional Elective
CME397 Surface Engineering- Professional Elective
 
The role of big data in decision making.
The role of big data in decision making.The role of big data in decision making.
The role of big data in decision making.
 
Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024Nuclear Power Economics and Structuring 2024
Nuclear Power Economics and Structuring 2024
 
space technology lecture notes on satellite
space technology lecture notes on satellitespace technology lecture notes on satellite
space technology lecture notes on satellite
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
 
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...
 
WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234WATER CRISIS and its solutions-pptx 1234
WATER CRISIS and its solutions-pptx 1234
 
Architectural Portfolio Sean Lockwood
Architectural Portfolio Sean LockwoodArchitectural Portfolio Sean Lockwood
Architectural Portfolio Sean Lockwood
 
power quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptxpower quality voltage fluctuation UNIT - I.pptx
power quality voltage fluctuation UNIT - I.pptx
 
Final project report on grocery store management system..pdf
Final project report on grocery store management system..pdfFinal project report on grocery store management system..pdf
Final project report on grocery store management system..pdf
 

2013 Hello GCC:The Theory, History and Future of System Linkers

  • 1. Together, we can make difference The Theory, History and Future of System Linkers Luba Tang CEO & Founder, Skymizer Inc.
  • 2. Outline • The History – Target Independent Linkers – Post Optimizers – Instrumentation Tools • The Theory – Linking Language – Fragment-reference graph • The Future – for GPGPU; for virtual machines – The bold project 唐文力 Luba Tang CEO & Founder of Skymizer Inc. Architect of MCLinker and GYM compiler Compiler and Linker/Electronic System Level Design
  • 3. Linker: The Elephant in the Room • System linkers are very complicated. Only a few team can make a full-fledge system linker. – There are only four open source linkers that can be said full-fledge. • GNU ld, Google gold can link Linux kernel • Apple ld64 can link Mac OS X and iOS • MCLinker can link BSD and Android system • ELF linkers are super complicated. There are many undocumented behaviors and target-specific behaviors. – The other linkers are developed for more than three years and can not be released. The linking problem is intricate. • Although a lot of researches have proven linker itself can optimize programs at a high performance level, developers still not get benefit from these researches. 3
  • 4. No Linker Really Optimize Programs • MCLinker is 35% faster than the Google gold, and the Google gold is ~200% faster than GNU ld • If we turn on optimization flags, the output quality is almost identical to all linkers (<3 %)
  • 5. Comparison of ELF Linkers GNU ld Google gold MCLinker License GPLv3 Cannot be adopted by Android UIUC BSD-Style Target Platform All Linux mainstream devices ARM, X86, X86_64, (Mips, SPARC) All Android devices. ARM, X86, Mips (X86_64, X32, Mips64 and Hexagon) Object Format COFF, a.out, ELF ELF only ELF, extensible Line of Code 500+K 100+K 50+K Performance - Fast Fastest Steadily x2 than GNU ld, x1.3 than Google gold Intermediate Representation The BFD library for reference graph None Command line language and reference graph 5
  • 6. The Most-Recently Important Target Independent Linker Research lcc link 1982 ? 1982 應該超過半數聽眾 還沒出生
  • 7. LINK: A Machine-Independent Linker • Team – Christopher W. Fraser – David R. Hanson • 1982, Software Practice and Experience – Define linker and object language (the predecessor of linker script) – Define three basic rules • Define the condition of resolution • Define the condition of absolute objects • Define when to pull in a library lcc link 1982
  • 8. Linker; Post Optimizer; Instrumentation lcc link 1982 ? 後面的人好像重點 開始歪掉
  • 9. OM: Code Optimization at Link-Time System • Team – Amitabh Srivastava – David W. Wall • 1992 Technical Report – An approach to transform binary into RTL – Use RTL to do inter-procedural optimization (5%~14%, SPEC) • Dead code elimination • Loop Invariant Code Motion (LICM) • 1994 SIGPLAN (3.8%, SPEC) – Replace load instruction and eliminate GAT – Reduce code size by 10% or more OM 1992, 1994
  • 10. OM: Code Optimization at Link-Time System • Key Contributions of OM are – OM identifies the problems to translate binary back to assembly. • PC-relative branches only • Convert jump table back to case-statement • No delayed branch, no delay slot OM 1992, 1994 退休 Ya!
  • 11. Spike: A successor or a competitor of OM • DEC Team – Robert Cohn – David W. Goodwin – P. Geoffrey Lowney • 1996 Micro 29 (They call themselves another OM) – Hot Code Optimization to use shorter jump – Works on Windows/NT Digital Alpha 3~8% improvement Spike 1996, 1997 RC
  • 12. ATOM: Analysis Tools with OM (Best of PLDI 1979-1999) • Dream Team - 1999 – Amitabh Srivastava (President of EMC) – Alan Eustace (Senior VP of Google Search) ATOM 1999
  • 13. ATOM: Analysis Tools with OM (Best of PLDI 1979-1999) • Key Contributions of ATOM are – ATOM defines the use scenario and APIs of an instrumentation tool – Intel Pin follows APIs of ATOM. • The rest contributions: – Reducing procedure call overhead (caller-save and callee-save) – Use virtual machine to instrument program • Defines the necessary memory layout
  • 14. Chronicle of Linker Optimization OM 1992, 1994 Spike 1996, 1997 RC ATOM 1999 Alto 1999 ICFG 2000, 2001, 2002 Diablo 2003, 2005, 2007 Bruno De BUS Pin 2005, 2007, 2011 RC
  • 15. Alto: A Link-Time Optimizer for the Compaq Alpha • Team – Robert Muth – Saumya Debray – Scott Watterson – Keo De Bosschere • Convert binary into control flow graph – General approach – The inspirer of ICFG Alto 1999
  • 16. Alto: A Link-Time Optimizer for the Compaq Alpha • Powerful Analysis and Optimization – Simplification • Dead code elimination • Normalize operations who express the same semantics • Use nops instead of remove instructions directly – Analysis • Machine level idioms for control transfer • Live analysis (register level) – Optimization • Constant propagation (remove load, 6.4%) • Dead code elimination • Unused memory elimination (remove load, speed up 5.7%) • Low level inlining (10% on average) • Profile-directed code layout (6.5%) • Instruction scheduling
  • 17. ICFG: Interprocedural Control Flow Graph • Team – Saumya Debray – William Evans – Robert Muth – Daniel Kastner – Bjorn De Sutter – Koen De Bosschere • ACM Trans. on Programming Languages and Systems, 2000 – Defines ICFG – Collect compiler techniques for code compaction – Reduce 30% on the average ICFG 2000, 2001, 2002
  • 18. Diablo: Post-Pass Optimization • Team, Collection of Euro – Bruno De Bus – Saumya Debray – William Evans – Robert Muth – Daniel Kastner – Ludo Van Put – Bjorn De Sutter – Koen De Bosschere • First complete post-pass optimizer – A lot of following researches Diablo 2002 - 2007 Bruno De BUS
  • 19. Diablo: Post-Pass Optimization • For code size, C++ have more opportunity than C – Sifting out the Mud: Low Level C++ Code Reuse, OOPSLA’02 • Reduce 27~70%, 43% on average – Combining Global Code and Data Compaction, LCTES’01 • Reduce 23.6%~46.6%; 8% faster • CFG reconstruction becomes mature – Generic Control Flow reconstruction from Assembly Code, LCTES’02 – Can handle delay slots and restricted indirection
  • 20. Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation • Team, Collection of USA, Intel – Chi-Keung Luk – Robert Cohn – Robert Muth – Harish Patil – Artur Klauser – Geoff Lowney – Steven Wallace – Vijay Janapa Reddi – Kim Hazelwood • Pin release the power of program analysis – 1608 citation since 2005 – Heavily cited in GPGPU and HSA area Pin 2005, 2007, 2011 RC
  • 21. Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation State-of-Art instrumentation tool
  • 22. Pin Provides ATOM-like APIs • User can write his own instrument and analysis code
  • 23.
  • 24. Linker: The Elephant in the Room • Although a lot of researches have proven linker itself can optimize programs at a high performance level, developers still not get benefit from these researches. 24
  • 25. Outline • The History – Target Independent Linkers – Post Optimizers – Instrumentation Tools • The Theory – Linking Language – Fragment-reference graph • The Future – for GPGPU; for virtual machines – The bold project 唐文力 Luba Tang CEO & Founder of Skymizer Inc. Architect of MCLinker and GYM compiler Compiler and Linker/Electronic System Level Design
  • 26. Introduction to Linker Intermediate Representation • MCLinker is the first *ELF linker to provide an intermediate representation (IR) for efficient transformation and analysis • MCLinker provides IR on two levels – Linker Command Line Language – Fragment-Reference Graph • Fragment is the basic linking unit, it can be – A section (coarse granularity) – A block of code or instructions (middle granularity) – An individual symbol and its code/data (fine granularity) • MCLinker can trade linking time for the output quality. – The finer granularity, • Fast, smaller program • Longer link time * Nick Kledzik invents the Atom IR in ld64 for MachO. ld64 inspires MCLinker IRs
  • 27. The Linker Command Line Language • Linker’s command line options is a kind of language – The meaning of a option depends on • their positions • the other potions – Some options have its own grammar ▪ Four categories of the options – Input files – Attributes of the input files – Linker script options – General options ▪ Examples ld /tmp/xxx.o –lpthread ld –as-needed ./yyy.so ld –defsym=cgo13=0x224 ld –L/opt/lib –T ./my.x
  • 28. The GNU ld Linker • The GNU ld linker is an interpreter of the command line language – Processing is recursive. – No clear separation between individual steps – Binary File Descriptor (BFD) is the only IR
  • 29. The Google gold Linker • The Google gold linker separates linking into two stages – Symbol resolution – Relocation of instructions and data • Although it has separated the linking processes, it does not provide reusable IR for optimization and analysis • The Google gold linker illustrates an efficient linking algorithm – It’s x2 faster than the GNU ld linker – Support multiple threads. Appropriate to cloud computing
  • 30. MCLinker • MCLinker separates the linking into four distinct stages – Normalization – parse the command line language – Resolution – resolve symbols – Layout – relocate instructions and data – Emission – emit file by various formats • MCLinker provides two level intermediate representation (IR) – The command line language level – The reference graph level
  • 31. Input Files on The Command Line • An input file can be an object file, an archive, or a linker script • Some input files can be defined multiple times • The result of linking depends on the positions of inputs on the command line. – Weak symbols are first-come-first-served – COMDAT sections are first-come-first-served • Two semantics to read input files – INPUT( file1, file2, file3, ...) – GROUP( archive1, archive2, archive3, ...) • Archives in a group are searched repeatedly until no new undefined references are created $ ld a.o –start-group b.a c.a –end-group d.o e.o
  • 32. The Input File Tree • We can represent the input files on the command line by a tree structure – Vertices describes input files and groups on the command line • Object files • Archives • Linker scripts • Entrances of groups • Edges describe the relationships between vertices – Positional edges – Inclusive edges • Linkers resolve symbols by DFS and merge sections by BFS • Example $ ld a.o –start-group b.a c.a –end-group d.o e.o
  • 33. Attributes of Input Files • Attributes change the way that a linker handles the input files • Attributes affect the input files after the attribute options Functions Options Meanings Whole archives --whole-archive Includes every file in the archive Link against dynamic libraries -Bdynamic Search shared libraries for -l option As needed --as-needed Only add the necessary shared libraries to resolve symbols Input format --format= The format of the following input files
  • 34. Attributes in The Input File Tree • Every input has a set of attributes • In the MCLinker implementation, we give every vertex a reference to its attribute set • If two vertices have identical attributes, they can share a common attribute set. • Example $ld ./a.o --whole-archive --start-group ./b.a ./c.a --end-group --no-whole-archive ./d.o ./e.o
  • 35. Normalization • Transform the command line language into the input file tree – Parse command line options – Recognize input files to build up sub-trees – Merge all sub-trees to a form the input file tree
  • 36. Steps of Normalization • Step of normalization 1. Parse the command line options 2. Recognize archives and linker scripts 3. Read the linker scripts and archives to create sub-trees 4. Merge all sub-trees • Example $ ld ./a.o ./b.a ./c.o
  • 37. Traverse the Input File Tree • MCLinker provides different iterators for different purposes – For symbol resolution • Depth first search for correctness – For section merging • Breadth first search for cache locality of the output file
  • 38. Resolution • Transform the input file tree into the reference graph – Resolves symbols – Reads relocation – Builds the reference graph
  • 39. Symbols and Relocations • A fragment is a block of instruction code or data in a module – A fragment may be • a function, • a label (Basic block), • a 32-bit integer data, and so on. • A defined symbol indicates a fragment • A relocation represents an use-define relationship between two fragments define @bar() … add @a, 0x1, 0x2 … @a = global i32 0 … Module X Module Y relocation usedefine Symbol @a Symbol @bar
  • 40. Fragment-Reference Graph (1/2) • A reference is a symbolic linkage between two fragments – A reference is an directed edge from use to define • MCLinker represents the input modules as a graph structure – Vertices describe the fragments of modules – Edges describe the references between two fragments relocation usedefine symboldefine fragment use fragment a reference
  • 41. Fragment-Reference Graph (2/2) • A Fragment-Reference Graph is a digraph, FRG = (V, E, S, O) – V is a set of fragments – E is a set of references, from use to define – S is a set of define symbols. They are the entrances of the graph – O is a set of exits and explains later. __start__global fragment edge
  • 42. Symbol Resolution • Determine the topology of the reference graph – Relocation is a plug – Define symbol is a slot – Symbol resolution connects plugs and slots. • Symbols has a set of attributes to help linkers determine the correct topology relocation use define symbol define fragment use fragment Undefine symbol define symbol define fragment define Which one?
  • 43. Optimizations on the Fragment-Reference Graph • Fragment stripping – Remove unused fragment for shrink code size (Reachability problem) – Traditional linkers strip coarse sections. But MCLinker can strips finer-grained fragments. – The finer granularity, the smaller code size • Branch optimization – Replace high cost branch by low cost branch – Optimizing by change of the relocation type • Low-level inlining - ICF • Fragment duplication for TLS optimization and copy relocations
  • 44. Layout • To serialize the reference graph into a address space – Scan relocations – Layout – Apply relocations
  • 45. Exits of The Fragment-Reference Graph • A Fragment-Reference Graph is a digraph, FRG = (V, E, S, O) – O is a set of exits. An exit represents a dynamic relocation to GOT. – Represent to access external variables or to call an external function exits the FRG • If the defining fragment is in an external module, then MCLinker will add exits for the references to the outside module. – We have no way to know the memory address of the external module until the load time – We add the Global Offset Table (GOT) for the unknown addresses – We add dynamic relocations for all entries of the GOT – Loader will apply the dynamic relocations and set the correct address in the GOT. – The program use the GOT to accesses the external module indirectly __start GOT relocation use define relocation exit
  • 46. Layout • Layout is a process to finalize the address of fragment and symbols – Sorts FRG=(V, E, S, O) topologically – Assigns addresses to {V, S, O} • Before layout, we must calculate the sizes of all elements of the graph – Relocation scanning • Reserve exits and calculate the sizes of all exits • Undefined global symbol, GOT, and dynamic relocations – *Pre-layout • Calculate the size of all fragments • Calculate the size of all entrances – Global symbols and the hash table * MCLinker follows the Google gold linker’s naming. But pre-layout is opaque and may be renamed.
  • 47. Apply relocation (1/2) • Adjusts the content of using fragments – Final addresses of symbol is known after layout – Correct use fragment by accessed address add @a , 0x1, 0x2 … 0x24 @a … Symbol Table Module Y relocation usedefine 0x24
  • 48. Apply relocation (2/2) • Replaces absolute addresses by PC-related offset if supported by the target • Basic Relocation Formula S – P + A – S: the symbol value – P: the place of the use instruction – A: addend, adjustment (by the instruction format) … @a … add @a , 0x1, 0x2 S P S - P A address space
  • 49. Optimizations on Layout • Dynamic Prelinking – If the system puts shared libraries at a fixed memory location, we can fill GOT with fixed addresses to avoid symbol look up in the loader • Static Prelinking – If the system puts shared libraries at a fixed memory location, we can directly refer to the fixed addresses without any exits • Symbol Stripping – Strip the undefined symbols which is not a exit • Sections/functions/basic block Reordering – Linker knows the address and can perform better reordering
  • 50. Emission • Emits the module in the output formats – Adds format information – Writes down the IR • In order to improve both cache and page locality, MCLinker collects and performs most file operations in this stage. – MCLinker copies the content in the inputs and applies the resolved reference in this stage.
  • 51. Outline • The History – Target Independent Linkers – Post Optimizers – Instrumentation Tools • The Theory – Linking Language – Fragment-reference graph • The Future – for GPGPU; for virtual machines – The bold project 唐文力 Luba Tang CEO & Founder of Skymizer Inc. Architect of MCLinker and GYM compiler Compiler and Linker/Electronic System Level Design
  • 52. The bold Project Challenge: Unified Shared Memory of Heterogeneous Many-Core System • Installation time compilation – GPGPU languages (OpenCL, CUDA, RenderScript) – Virtual Machine (Dalvik, RenderScript) • Heterogeneous Many-core System – Universal ELF Unified Loader Modular Linker GCC LLVM Dalvik RenderScript ARM HSA GPU DSP OpenCL
  • 53. The bold Project • BSD licensing linker – General purpose linker/loader – Focus on optimization – Linking in parallel • OA (Owner agreement) and CA (Committer agreement) – Avoid interest confliction between industry and community. – Legal person can not be an owner Fortune favors the bold