是否總感覺這世界在變,但是系統軟體卻不動如山?其實不盡如此,從風起雲湧的 AI 技術、VR、AR、自動車等未來科技,無一不是受系統軟體的改進而達成,不論是 Android ART, LLVM, Swift, GPGPU,均是由系統軟體的改善所推動的。
這篇 talk 是 Luba Tang 在許多地方所分享的「The Theory, History and Future of Systme Linkers」,藉由連結器的研發進展,讓人一窺未來的樣貌。
Update on IRATI technical work after month 6Eleni Trouva
Summary of the technical work done by the project during the first six months, in the following areas:
* Use cases description and requirements analysis
* RINA specifications
* Software design and implementation
* Experimental infrastructure setup
Update on IRATI technical work after month 6Eleni Trouva
Summary of the technical work done by the project during the first six months, in the following areas:
* Use cases description and requirements analysis
* RINA specifications
* Software design and implementation
* Experimental infrastructure setup
Field Programmable Gate Arrays (FPGAs) are configurable integrated circuits able to provide a good trade-off in terms of performance, power consumption, and flexibility with respect to other architectures, like CPUs, GPUs and ASICs. The main drawback in using FPGAs, however, is their steep learning curve. An emerging solution to this problem is to write algorithms in a Domain Specific Language (DSL) and to let the DSL compiler generate efficient code targeting FPGAs. This work proposes FROST, a unified backend that enables different DSL compilers to target FPGA architectures. Differently from other code generation frameworks targeting FPGA, FROST exploits a scheduling co-language that enables users to have full control over which optimizations to apply in order to generate efficient code (e.g. loop pipelining, array partitioning, vectorization). At first, FROST analyzes and manipulates the input Abstract Syntax Tree (AST) in order to apply FPGA-oriented transformations and optimizations, then generates a C/C++ implementation suitable for High-Level Synthesis (HLS) tools. Finally, the output of HLS phase is synthesized and implemented on the target FPGA using Xilinx SDAccel toolchain.
FROST currently supports as front-end Halide, an Image Processing DSL, and Tiramisu, a DSL optimizer, and allows to achieve significant speedups with respect to state-of-the-art FPGA implementations of the same algorithms.
[Dec./2017] My Personal/Professional Journey after Graduate Univ.Hayoung Yoon
Prof. Young-Bae Ko from Ajou Univ., South Korea, invites me to talk about my personal and professional journies after graduate university. The audience was fresh graduate students, including some international students. It was not easy to present because it was the first non-technical presentation or myself delivered in English. On the otherhand, it was a precious time in preparing slides for me to recall all good memories and joys during my Ph.D. and industrial career. As a student pursuing Ph.D., I made many failures. But every time I face them, there was someone who couraged and helped me to overcome. I am sharing these slides with gratitude for those who supported me to be an engineer.
gain knowledge, understand logic of network system and has ability to create a network.OSI – TCP/IP – modem – NIC , Explain OSI and TCP/IP data models; describe how to configure a NIC and a modem, Identify names, purposes, and characteristics of other technologies that are used to establish connectivity.
FellowBuddy.com is an innovative platform that brings students together to share notes, exam papers, study guides, project reports and presentation for upcoming exams.
We connect Students who have an understanding of course material with Students who need help.
Benefits:-
# Students can catch up on notes they missed because of an absence.
# Underachievers can find peer developed notes that break down lecture and study material in a way that they can understand
# Students can earn better grades, save time and study effectively
Our Vision & Mission – Simplifying Students Life
Our Belief – “The great breakthrough in your life comes when you realize it, that you can learn anything you need to learn; to accomplish any goal that you have set for yourself. This means there are no limits on what you can be, have or do.”
Like Us - https://www.facebook.com/FellowBuddycom
Safe and Reliable Embedded Linux Programming: How to Get ThereAdaCore
The talk provides an overview of techniques to design and implement reliable embedded applications. The goal is to achieve safe and analyzable behavior by construction, including handling parallel multiprocessor systems in an efficient and predictable way. The means to attain this objective is to statically configure the application to run on embedded linux platforms, and then to use run-time support to enforce constraints imposed to the system.
Keynote at COMMitMDE'18 showing the basic concepts behind Hawk, our past case studies, and some of our experience in designing the Hawk Thrift APIs for remote model querying.
Field Programmable Gate Arrays (FPGAs) are configurable integrated circuits able to provide a good trade-off in terms of performance, power consumption, and flexibility with respect to other architectures, like CPUs, GPUs and ASICs. The main drawback in using FPGAs, however, is their steep learning curve. An emerging solution to this problem is to write algorithms in a Domain Specific Language (DSL) and to let the DSL compiler generate efficient code targeting FPGAs. This work proposes FROST, a unified backend that enables different DSL compilers to target FPGA architectures. Differently from other code generation frameworks targeting FPGA, FROST exploits a scheduling co-language that enables users to have full control over which optimizations to apply in order to generate efficient code (e.g. loop pipelining, array partitioning, vectorization). At first, FROST analyzes and manipulates the input Abstract Syntax Tree (AST) in order to apply FPGA-oriented transformations and optimizations, then generates a C/C++ implementation suitable for High-Level Synthesis (HLS) tools. Finally, the output of HLS phase is synthesized and implemented on the target FPGA using Xilinx SDAccel toolchain.
FROST currently supports as front-end Halide, an Image Processing DSL, and Tiramisu, a DSL optimizer, and allows to achieve significant speedups with respect to state-of-the-art FPGA implementations of the same algorithms.
[Dec./2017] My Personal/Professional Journey after Graduate Univ.Hayoung Yoon
Prof. Young-Bae Ko from Ajou Univ., South Korea, invites me to talk about my personal and professional journies after graduate university. The audience was fresh graduate students, including some international students. It was not easy to present because it was the first non-technical presentation or myself delivered in English. On the otherhand, it was a precious time in preparing slides for me to recall all good memories and joys during my Ph.D. and industrial career. As a student pursuing Ph.D., I made many failures. But every time I face them, there was someone who couraged and helped me to overcome. I am sharing these slides with gratitude for those who supported me to be an engineer.
gain knowledge, understand logic of network system and has ability to create a network.OSI – TCP/IP – modem – NIC , Explain OSI and TCP/IP data models; describe how to configure a NIC and a modem, Identify names, purposes, and characteristics of other technologies that are used to establish connectivity.
FellowBuddy.com is an innovative platform that brings students together to share notes, exam papers, study guides, project reports and presentation for upcoming exams.
We connect Students who have an understanding of course material with Students who need help.
Benefits:-
# Students can catch up on notes they missed because of an absence.
# Underachievers can find peer developed notes that break down lecture and study material in a way that they can understand
# Students can earn better grades, save time and study effectively
Our Vision & Mission – Simplifying Students Life
Our Belief – “The great breakthrough in your life comes when you realize it, that you can learn anything you need to learn; to accomplish any goal that you have set for yourself. This means there are no limits on what you can be, have or do.”
Like Us - https://www.facebook.com/FellowBuddycom
Safe and Reliable Embedded Linux Programming: How to Get ThereAdaCore
The talk provides an overview of techniques to design and implement reliable embedded applications. The goal is to achieve safe and analyzable behavior by construction, including handling parallel multiprocessor systems in an efficient and predictable way. The means to attain this objective is to statically configure the application to run on embedded linux platforms, and then to use run-time support to enforce constraints imposed to the system.
Keynote at COMMitMDE'18 showing the basic concepts behind Hawk, our past case studies, and some of our experience in designing the Hawk Thrift APIs for remote model querying.
For the full video of this presentation, please visit:
http://www.embedded-vision.com/platinum-members/luxoft/embedded-vision-training/videos/pages/may-2016-embedded-vision-summit
For more information about embedded vision, please visit:
http://www.embedded-vision.com
Alexey Rybakov, Senior Director at LUXOFT, presents the "Making Computer Vision Software Run Fast on Your Embedded Platform" tutorial at the May 2016 Embedded Vision Summit.
Many computer vision algorithms perform well on desktop class systems, but struggle on resource constrained embedded platforms. This how-to talk provides a comprehensive overview of various optimization methods that make vision software run fast on low power, small footprint hardware that is widely used in automotive, surveillance, and mobile devices. The presentation explores practical aspects of deep algorithm and software optimization such as thinning of input data, using dynamic regions of interest, mastering data pipelines and memory access, overcoming compiler inefficiencies, and more.
Ice Age melting down: Intel features considered usefull!Peter Hlavaty
Decades history of kernel exploitation, however still most used techniques are such as ROP. Software based approaches comes finally challenge this technique, one more successful than the others. Those approaches usually trying to solve far more than ROP only problem, and need to handle not only security but almost more importantly performance issues. Another common attacker vector for redirecting control flow is stack what comes from design of today’s architectures, and once again some software approaches lately tackling this as well. Although this software based methods are piece of nice work and effective to big extent, new game changing approach seems coming to the light. Methodology closing this attack vector coming right from hardware - intel. We will compare this way to its software alternatives, how one interleaving another and how they can benefit from each other to challenge attacker by breaking his most fundamental technologies. However same time we go further, to challenge those approaches and show that even with those technologies in place attackers is not yet in the corner.
A design methodology and a language framework which contributes to providing a solid, scalable framework for developing next-generation silicon-based systems.
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...Jose Quesada (hiring)
The machine learning libraries in Apache Spark are an impressive piece of software engineering, and are maturing rapidly. What advantages does Spark.ml offer over scikit-learn? At Data Science Retreat we've taken a real-world dataset and worked through the stages of building a predictive model -- exploration, data cleaning, feature engineering, and model fitting; which would you use in production?
The machine learning libraries in Apache Spark are an impressive piece of software engineering, and are maturing rapidly. What advantages does Spark.ml offer over scikit-learn?
At Data Science Retreat we've taken a real-world dataset and worked through the stages of building a predictive model -- exploration, data cleaning, feature engineering, and model fitting -- in several different frameworks. We'll show what it's like to work with native Spark.ml, and compare it to scikit-learn along several dimensions: ease of use, productivity, feature set, and performance.
In some ways Spark.ml is still rather immature, but it also conveys new superpowers to those who know how to use it.
The presentation provides an introduction to the emulation world, in particular to the mythical Commodore 64 and its peripherals, like disk drive, printer, cartridges. To truly emulate the software written for this 8-bit home computer it is mandatory to be much accurate as possible and reproduce every single aspect of the real machine, starting from the chips that compose the hardware architecture. Beside the emulation topics the presentation faces some Scala performance issues that come up when you have to optimize low level operations. At the end I'll show you a demo where we'll see the emulator running a game and a demo-scene, one of the hardest software to emulate.
Similar to 2013 Hello GCC:The Theory, History and Future of System Linkers (20)
Overview of the fundamental roles in Hydropower generation and the components involved in wider Electrical Engineering.
This paper presents the design and construction of hydroelectric dams from the hydrologist’s survey of the valley before construction, all aspects and involved disciplines, fluid dynamics, structural engineering, generation and mains frequency regulation to the very transmission of power through the network in the United Kingdom.
Author: Robbie Edward Sayers
Collaborators and co editors: Charlie Sims and Connor Healey.
(C) 2024 Robbie E. Sayers
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)MdTanvirMahtab2
This presentation is about the working procedure of Shahjalal Fertilizer Company Limited (SFCL). A Govt. owned Company of Bangladesh Chemical Industries Corporation under Ministry of Industries.
Cosmetic shop management system project report.pdfKamal Acharya
Buying new cosmetic products is difficult. It can even be scary for those who have sensitive skin and are prone to skin trouble. The information needed to alleviate this problem is on the back of each product, but it's thought to interpret those ingredient lists unless you have a background in chemistry.
Instead of buying and hoping for the best, we can use data science to help us predict which products may be good fits for us. It includes various function programs to do the above mentioned tasks.
Data file handling has been effectively used in the program.
The automated cosmetic shop management system should deal with the automation of general workflow and administration process of the shop. The main processes of the system focus on customer's request where the system is able to search the most appropriate products and deliver it to the customers. It should help the employees to quickly identify the list of cosmetic product that have reached the minimum quantity and also keep a track of expired date for each cosmetic product. It should help the employees to find the rack number in which the product is placed.It is also Faster and more efficient way.
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdffxintegritypublishin
Advancements in technology unveil a myriad of electrical and electronic breakthroughs geared towards efficiently harnessing limited resources to meet human energy demands. The optimization of hybrid solar PV panels and pumped hydro energy supply systems plays a pivotal role in utilizing natural resources effectively. This initiative not only benefits humanity but also fosters environmental sustainability. The study investigated the design optimization of these hybrid systems, focusing on understanding solar radiation patterns, identifying geographical influences on solar radiation, formulating a mathematical model for system optimization, and determining the optimal configuration of PV panels and pumped hydro storage. Through a comparative analysis approach and eight weeks of data collection, the study addressed key research questions related to solar radiation patterns and optimal system design. The findings highlighted regions with heightened solar radiation levels, showcasing substantial potential for power generation and emphasizing the system's efficiency. Optimizing system design significantly boosted power generation, promoted renewable energy utilization, and enhanced energy storage capacity. The study underscored the benefits of optimizing hybrid solar PV panels and pumped hydro energy supply systems for sustainable energy usage. Optimizing the design of solar PV panels and pumped hydro energy supply systems as examined across diverse climatic conditions in a developing country, not only enhances power generation but also improves the integration of renewable energy sources and boosts energy storage capacities, particularly beneficial for less economically prosperous regions. Additionally, the study provides valuable insights for advancing energy research in economically viable areas. Recommendations included conducting site-specific assessments, utilizing advanced modeling tools, implementing regular maintenance protocols, and enhancing communication among system components.
Saudi Arabia stands as a titan in the global energy landscape, renowned for its abundant oil and gas resources. It's the largest exporter of petroleum and holds some of the world's most significant reserves. Let's delve into the top 10 oil and gas projects shaping Saudi Arabia's energy future in 2024.
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Dr.Costas Sachpazis
Terzaghi's soil bearing capacity theory, developed by Karl Terzaghi, is a fundamental principle in geotechnical engineering used to determine the bearing capacity of shallow foundations. This theory provides a method to calculate the ultimate bearing capacity of soil, which is the maximum load per unit area that the soil can support without undergoing shear failure. The Calculation HTML Code included.
Water scarcity is the lack of fresh water resources to meet the standard water demand. There are two type of water scarcity. One is physical. The other is economic water scarcity.
Final project report on grocery store management system..pdfKamal Acharya
In today’s fast-changing business environment, it’s extremely important to be able to respond to client needs in the most effective and timely manner. If your customers wish to see your business online and have instant access to your products or services.
Online Grocery Store is an e-commerce website, which retails various grocery products. This project allows viewing various products available enables registered users to purchase desired products instantly using Paytm, UPI payment processor (Instant Pay) and also can place order by using Cash on Delivery (Pay Later) option. This project provides an easy access to Administrators and Managers to view orders placed using Pay Later and Instant Pay options.
In order to develop an e-commerce website, a number of Technologies must be studied and understood. These include multi-tiered architecture, server and client-side scripting techniques, implementation technologies, programming language (such as PHP, HTML, CSS, JavaScript) and MySQL relational databases. This is a project with the objective to develop a basic website where a consumer is provided with a shopping cart website and also to know about the technologies used to develop such a website.
This document will discuss each of the underlying technologies to create and implement an e- commerce website.
Final project report on grocery store management system..pdf
2013 Hello GCC:The Theory, History and Future of System Linkers
1. Together, we can make
difference
The Theory, History and Future
of
System Linkers
Luba Tang
CEO & Founder, Skymizer Inc.
2. Outline
• The History
– Target Independent Linkers
– Post Optimizers
– Instrumentation Tools
• The Theory
– Linking Language
– Fragment-reference graph
• The Future
– for GPGPU; for virtual machines
– The bold project
唐文力 Luba Tang
CEO & Founder of Skymizer Inc.
Architect of MCLinker and GYM compiler
Compiler and Linker/Electronic System Level Design
3. Linker: The Elephant in the Room
• System linkers are very complicated. Only a few team can make
a full-fledge system linker.
– There are only four open source linkers that can be said full-fledge.
• GNU ld, Google gold can link Linux kernel
• Apple ld64 can link Mac OS X and iOS
• MCLinker can link BSD and Android system
• ELF linkers are super complicated. There are many
undocumented behaviors and target-specific behaviors.
– The other linkers are developed for more than three years and can not
be released. The linking problem is intricate.
• Although a lot of researches have proven linker itself can
optimize programs at a high performance level, developers still
not get benefit from these researches.
3
4. No Linker Really Optimize Programs
• MCLinker is 35% faster than the Google gold, and the Google
gold is ~200% faster than GNU ld
• If we turn on optimization flags, the output quality is almost
identical to all linkers (<3 %)
5. Comparison of ELF Linkers
GNU ld Google gold MCLinker
License GPLv3
Cannot be adopted by Android
UIUC BSD-Style
Target Platform All Linux mainstream
devices
ARM, X86, X86_64,
(Mips, SPARC)
All Android devices.
ARM, X86, Mips
(X86_64, X32, Mips64 and
Hexagon)
Object Format COFF, a.out, ELF ELF only ELF, extensible
Line of Code 500+K 100+K 50+K
Performance - Fast Fastest
Steadily x2 than GNU ld, x1.3
than Google gold
Intermediate
Representation
The BFD library for
reference graph
None Command line language and
reference graph
5
7. LINK: A Machine-Independent Linker
• Team
– Christopher W. Fraser
– David R. Hanson
• 1982, Software Practice and Experience
– Define linker and object language (the predecessor of
linker script)
– Define three basic rules
• Define the condition of resolution
• Define the condition of absolute objects
• Define when to pull in a library
lcc link
1982
9. OM: Code Optimization at Link-Time System
• Team
– Amitabh Srivastava
– David W. Wall
• 1992 Technical Report
– An approach to transform binary into RTL
– Use RTL to do inter-procedural optimization (5%~14%,
SPEC)
• Dead code elimination
• Loop Invariant Code Motion (LICM)
• 1994 SIGPLAN (3.8%, SPEC)
– Replace load instruction and eliminate GAT
– Reduce code size by 10% or more
OM
1992, 1994
10. OM: Code Optimization at Link-Time System
• Key Contributions of OM are
– OM identifies the problems to translate binary
back to assembly.
• PC-relative branches only
• Convert jump table back to case-statement
• No delayed branch, no delay slot
OM
1992, 1994
退休 Ya!
11. Spike: A successor or a competitor of OM
• DEC Team
– Robert Cohn
– David W. Goodwin
– P. Geoffrey Lowney
• 1996 Micro 29 (They call themselves
another OM)
– Hot Code Optimization to use shorter jump
– Works on Windows/NT Digital Alpha 3~8%
improvement
Spike
1996, 1997
RC
12. ATOM: Analysis Tools with OM (Best of PLDI 1979-1999)
• Dream Team - 1999
– Amitabh Srivastava (President of EMC)
– Alan Eustace (Senior VP of Google Search)
ATOM
1999
13. ATOM: Analysis Tools with OM (Best of PLDI 1979-1999)
• Key Contributions of ATOM are
– ATOM defines the use scenario and APIs of an
instrumentation tool
– Intel Pin follows APIs of ATOM.
• The rest contributions:
– Reducing procedure call overhead (caller-save and
callee-save)
– Use virtual machine to instrument program
• Defines the necessary memory layout
14. Chronicle of Linker Optimization
OM
1992, 1994
Spike
1996, 1997
RC
ATOM
1999
Alto
1999
ICFG
2000, 2001, 2002 Diablo
2003, 2005, 2007
Bruno
De
BUS
Pin
2005, 2007, 2011
RC
15. Alto: A Link-Time Optimizer for the Compaq
Alpha
• Team
– Robert Muth
– Saumya Debray
– Scott Watterson
– Keo De Bosschere
• Convert binary into control flow graph
– General approach
– The inspirer of ICFG
Alto
1999
16. Alto: A Link-Time Optimizer for the Compaq
Alpha
• Powerful Analysis and Optimization
– Simplification
• Dead code elimination
• Normalize operations who express the same semantics
• Use nops instead of remove instructions directly
– Analysis
• Machine level idioms for control transfer
• Live analysis (register level)
– Optimization
• Constant propagation (remove load, 6.4%)
• Dead code elimination
• Unused memory elimination (remove load, speed up 5.7%)
• Low level inlining (10% on average)
• Profile-directed code layout (6.5%)
• Instruction scheduling
17. ICFG: Interprocedural Control Flow Graph
• Team
– Saumya Debray
– William Evans
– Robert Muth
– Daniel Kastner
– Bjorn De Sutter
– Koen De Bosschere
• ACM Trans. on Programming Languages and
Systems, 2000
– Defines ICFG
– Collect compiler techniques for code compaction
– Reduce 30% on the average
ICFG
2000, 2001, 2002
18. Diablo: Post-Pass Optimization
• Team, Collection of Euro
– Bruno De Bus
– Saumya Debray
– William Evans
– Robert Muth
– Daniel Kastner
– Ludo Van Put
– Bjorn De Sutter
– Koen De Bosschere
• First complete post-pass optimizer
– A lot of following researches
Diablo
2002 - 2007
Bruno
De
BUS
19. Diablo: Post-Pass Optimization
• For code size, C++ have more opportunity than
C
– Sifting out the Mud: Low Level C++ Code Reuse,
OOPSLA’02
• Reduce 27~70%, 43% on average
– Combining Global Code and Data Compaction,
LCTES’01
• Reduce 23.6%~46.6%; 8% faster
• CFG reconstruction becomes mature
– Generic Control Flow reconstruction from Assembly
Code, LCTES’02
– Can handle delay slots and restricted indirection
20. Pin: Building Customized Program Analysis
Tools with Dynamic Instrumentation
• Team, Collection of USA, Intel
– Chi-Keung Luk
– Robert Cohn
– Robert Muth
– Harish Patil
– Artur Klauser
– Geoff Lowney
– Steven Wallace
– Vijay Janapa Reddi
– Kim Hazelwood
• Pin release the power of program analysis
– 1608 citation since 2005
– Heavily cited in GPGPU and HSA area
Pin
2005, 2007, 2011
RC
21. Pin: Building Customized Program Analysis
Tools with Dynamic Instrumentation
State-of-Art instrumentation tool
24. Linker: The Elephant in the Room
• Although a lot of researches have proven
linker itself can optimize programs at a high
performance level, developers still not get
benefit from these researches.
24
25. Outline
• The History
– Target Independent Linkers
– Post Optimizers
– Instrumentation Tools
• The Theory
– Linking Language
– Fragment-reference graph
• The Future
– for GPGPU; for virtual machines
– The bold project
唐文力 Luba Tang
CEO & Founder of Skymizer Inc.
Architect of MCLinker and GYM compiler
Compiler and Linker/Electronic System Level Design
26. Introduction to Linker Intermediate Representation
• MCLinker is the first *ELF linker to provide an intermediate
representation (IR) for efficient transformation and analysis
• MCLinker provides IR on two levels
– Linker Command Line Language
– Fragment-Reference Graph
• Fragment is the basic linking unit, it can be
– A section (coarse granularity)
– A block of code or instructions (middle granularity)
– An individual symbol and its code/data (fine granularity)
• MCLinker can trade linking time for the output quality.
– The finer granularity,
• Fast, smaller program
• Longer link time
* Nick Kledzik invents the Atom IR in ld64 for MachO. ld64 inspires MCLinker IRs
27. The Linker Command Line Language
• Linker’s command line options is a kind of language
– The meaning of a option depends on
• their positions
• the other potions
– Some options have its own grammar
▪ Four categories of the options
– Input files
– Attributes of the input files
– Linker script options
– General options
▪ Examples
ld /tmp/xxx.o –lpthread
ld –as-needed ./yyy.so
ld –defsym=cgo13=0x224
ld –L/opt/lib –T ./my.x
28. The GNU ld Linker
• The GNU ld linker is an interpreter of the
command line language
– Processing is recursive.
– No clear separation between individual steps
– Binary File Descriptor (BFD) is the only IR
29. The Google gold Linker
• The Google gold linker separates linking into two stages
– Symbol resolution
– Relocation of instructions and data
• Although it has separated the linking processes, it does not provide reusable
IR for optimization and analysis
• The Google gold linker illustrates an efficient linking algorithm
– It’s x2 faster than the GNU ld linker
– Support multiple threads. Appropriate to cloud computing
30. MCLinker
• MCLinker separates the linking into four distinct stages
– Normalization – parse the command line language
– Resolution – resolve symbols
– Layout – relocate instructions and data
– Emission – emit file by various formats
• MCLinker provides two level intermediate representation (IR)
– The command line language level
– The reference graph level
31. Input Files on The Command Line
• An input file can be an object file, an archive, or a linker
script
• Some input files can be defined multiple times
• The result of linking depends on the positions of inputs
on the command line.
– Weak symbols are first-come-first-served
– COMDAT sections are first-come-first-served
• Two semantics to read input files
– INPUT( file1, file2, file3, ...)
– GROUP( archive1, archive2, archive3, ...)
• Archives in a group are searched repeatedly until no new
undefined references are created
$ ld a.o –start-group b.a c.a –end-group d.o e.o
32. The Input File Tree
• We can represent the input files on
the command line by a tree structure
– Vertices describes input files and
groups on the command line
• Object files
• Archives
• Linker scripts
• Entrances of groups
• Edges describe the relationships between
vertices
– Positional edges
– Inclusive edges
• Linkers resolve symbols by DFS and merge sections by BFS
• Example
$ ld a.o –start-group b.a c.a –end-group d.o e.o
33. Attributes of Input Files
• Attributes change the way that a linker handles the input files
• Attributes affect the input files after the attribute options
Functions Options Meanings
Whole archives --whole-archive Includes every file in the archive
Link against dynamic
libraries
-Bdynamic Search shared libraries for -l option
As needed --as-needed Only add the necessary shared libraries to
resolve symbols
Input format --format= The format of the following input files
34. Attributes in The Input File Tree
• Every input has a set of attributes
• In the MCLinker implementation,
we give every vertex a reference
to its attribute set
• If two vertices have identical
attributes, they can share a
common attribute set.
• Example
$ld ./a.o --whole-archive
--start-group ./b.a
./c.a --end-group
--no-whole-archive
./d.o ./e.o
35. Normalization
• Transform the command line language into the
input file tree
– Parse command line options
– Recognize input files to build up sub-trees
– Merge all sub-trees to a form the input file tree
36. Steps of Normalization
• Step of normalization
1. Parse the command line options
2. Recognize archives and linker
scripts
3. Read the linker scripts and
archives to create sub-trees
4. Merge all sub-trees
• Example
$ ld ./a.o ./b.a ./c.o
37. Traverse the Input File Tree
• MCLinker provides different iterators for different
purposes
– For symbol resolution
• Depth first search for correctness
– For section merging
• Breadth first search for cache locality of the output file
38. Resolution
• Transform the input file tree into the reference
graph
– Resolves symbols
– Reads relocation
– Builds the reference graph
39. Symbols and Relocations
• A fragment is a block of instruction code or data in a module
– A fragment may be
• a function,
• a label (Basic block),
• a 32-bit integer data, and so on.
• A defined symbol indicates a fragment
• A relocation represents an use-define relationship between two
fragments
define @bar()
…
add @a, 0x1, 0x2
…
@a = global i32 0
…
Module X Module Y
relocation
usedefine
Symbol
@a
Symbol
@bar
40. Fragment-Reference Graph (1/2)
• A reference is a symbolic linkage between two
fragments
– A reference is an directed edge from use to define
• MCLinker represents the input modules as a graph structure
– Vertices describe the fragments of modules
– Edges describe the references between two fragments
relocation
usedefine
symboldefine fragment use fragment
a reference
41. Fragment-Reference Graph (2/2)
• A Fragment-Reference Graph is a digraph, FRG = (V, E, S, O)
– V is a set of fragments
– E is a set of references, from use to define
– S is a set of define symbols. They are the entrances of the graph
– O is a set of exits and explains later.
__start__global
fragment
edge
42. Symbol Resolution
• Determine the topology of the reference graph
– Relocation is a plug
– Define symbol is a slot
– Symbol resolution connects plugs and slots.
• Symbols has a set of attributes to help linkers determine the correct
topology
relocation
use
define
symbol define fragment
use fragment Undefine symbol
define
symbol define fragment
define
Which
one?
43. Optimizations on the Fragment-Reference Graph
• Fragment stripping
– Remove unused fragment for shrink code size (Reachability
problem)
– Traditional linkers strip coarse sections. But MCLinker can
strips finer-grained fragments.
– The finer granularity, the smaller code size
• Branch optimization
– Replace high cost branch by low cost branch
– Optimizing by change of the relocation type
• Low-level inlining - ICF
• Fragment duplication for TLS optimization and copy
relocations
44. Layout
• To serialize the reference graph into a address
space
– Scan relocations
– Layout
– Apply relocations
45. Exits of The Fragment-Reference Graph
• A Fragment-Reference Graph is a digraph, FRG = (V, E, S, O)
– O is a set of exits. An exit represents a dynamic relocation to GOT.
– Represent to access external variables or to call an external function exits the FRG
• If the defining fragment is in an external module, then MCLinker will add exits
for the references to the outside module.
– We have no way to know the memory address of the external module until the load time
– We add the Global Offset Table (GOT) for the unknown addresses
– We add dynamic relocations for all entries of the GOT
– Loader will apply the dynamic relocations and set the correct address in the GOT.
– The program use the GOT to accesses the external module indirectly
__start
GOT
relocation
use define
relocation
exit
46. Layout
• Layout is a process to finalize the address of fragment and symbols
– Sorts FRG=(V, E, S, O) topologically
– Assigns addresses to {V, S, O}
• Before layout, we must calculate the sizes of all elements of the graph
– Relocation scanning
• Reserve exits and calculate the sizes of all exits
• Undefined global symbol, GOT, and dynamic relocations
– *Pre-layout
• Calculate the size of all fragments
• Calculate the size of all entrances
– Global symbols and the hash table
* MCLinker follows the Google gold linker’s naming. But pre-layout is opaque and may be renamed.
47. Apply relocation (1/2)
• Adjusts the content of using fragments
– Final addresses of symbol is known after layout
– Correct use fragment by accessed address
add @a , 0x1, 0x2
…
0x24 @a
…
Symbol Table Module Y
relocation
usedefine
0x24
48. Apply relocation (2/2)
• Replaces absolute addresses by PC-related offset if supported by the target
• Basic Relocation Formula
S – P + A
– S: the symbol value
– P: the place of the use instruction
– A: addend, adjustment (by the instruction format)
…
@a
…
add @a , 0x1, 0x2
S
P
S - P
A
address space
49. Optimizations on Layout
• Dynamic Prelinking
– If the system puts shared libraries at a fixed memory
location, we can fill GOT with fixed addresses to avoid
symbol look up in the loader
• Static Prelinking
– If the system puts shared libraries at a fixed memory
location, we can directly refer to the fixed addresses
without any exits
• Symbol Stripping
– Strip the undefined symbols which is not a exit
• Sections/functions/basic block Reordering
– Linker knows the address and can perform better
reordering
50. Emission
• Emits the module in the output formats
– Adds format information
– Writes down the IR
• In order to improve both cache and page locality, MCLinker
collects and performs most file operations in this stage.
– MCLinker copies the content in the inputs and applies the resolved
reference in this stage.
51. Outline
• The History
– Target Independent Linkers
– Post Optimizers
– Instrumentation Tools
• The Theory
– Linking Language
– Fragment-reference graph
• The Future
– for GPGPU; for virtual machines
– The bold project
唐文力 Luba Tang
CEO & Founder of Skymizer Inc.
Architect of MCLinker and GYM compiler
Compiler and Linker/Electronic System Level Design
52. The bold Project
Challenge: Unified Shared Memory of
Heterogeneous Many-Core System
• Installation time compilation
– GPGPU languages (OpenCL, CUDA, RenderScript)
– Virtual Machine (Dalvik, RenderScript)
• Heterogeneous Many-core System
– Universal ELF
Unified Loader
Modular Linker
GCC LLVM Dalvik RenderScript
ARM HSA GPU DSP
OpenCL
53. The bold Project
• BSD licensing linker
– General purpose linker/loader
– Focus on optimization
– Linking in parallel
• OA (Owner agreement) and CA (Committer
agreement)
– Avoid interest confliction between industry and
community.
– Legal person can not be an owner
Fortune favors the bold