Course Overview System Software• Introduction to System Software• Compilers.• Assembler.• Loaders and linkers.• Macro Processors.• Character I/O under Windows.• Files and directories Management under Windows.• Process creation under Windows.• Inter-process communication under Windows.
Course Objective• Going behind the scenes, gain a deep understanding of how computers actually work.• Understanding the relationship between system software and machine architecture.• Understanding how system software help program development (compilers, assemblers, linkers and loaders), and and program execution (OS, process management, file management, device management).• Getting basic knowledge and experience with Windows system through programming.
Text Book• System Software, An Introduction to Systems Programming, Leland L. Beck, Addison-Wesley ,1996• Windows System Programming, Johnson M. Hart, Third Edition, Addison-Wesley, 2005.• Course Notes.
Chapter Overview Introduction to System Software• Concepts to be learned – Application software – System software – Program development environment • Compilers, assemblers, linkers, debuggers – Program run-time environment • Operating systems, program loaders, program libraries – Source program, object program, executable program
System Software Definition• System software consists of a variety of programs that support the operation of a computer (but exactly what?) – developing programs: simplify the programming environment by hiding machine level complexity – running programs: enable efficient use of hardware by sharing• new definition: provides general programming development in which programmers can create specific applications, and lets the applications efficiently use the system hardware
System Software vs. Application Software• System software – Support the operation and use of the computer itself – machine dependency (not all features) – compilers, assemblers, linkers, loaders, debuggers, OS• Application software – designed as a tool to solve a specific problem – No direct relation with the hardware – Web browser, media players, office tools, image processors, messengers• text editor ?
Software Environments• Program development environment – compilers, assemblers, linkers, debuggers – Integrated developing environment (IDE) – IDE examples: Visual C++, J Builder, Visual Basic• Program run-time environment – operating systems, program loaders, program libraries – Java run time environment
Steps in Creating and Running C program Code Compiler Assembly language program Assembler Object: Machine language module Object: Library routine (machine language) Linker Executable: Machine language program Loader Memory
System Software for Program Development Source Object Computer programs programs hardware … Assembler … Devices JSUB F1 4B101036 (I/O) … … Linker Proce- … C/C++ compiler … Sssor(s) … F1 4B101036 4B106036 … … … … 4B10 … 8456 … Pascal compiler 4B101036 Loader … CALL F1 Memory … …
System Software for Program Running Computer hardware Devices (I/O) Device Manager Proce- Sssor(s) File Process and Manager Resource Manager Os PGM1 PGM3 PGM2 Memory Manager Memory
Other System Software• Window system – Provide virtual terminal to an application program – Map virtual terminal operations so that they apply to a specific physical region on a screen• Database management system – Store information on the computer’s permanent storage devices – Provide abstract data types (schema) and creates new application-specific software optimized for efficient queries/updates on the data according to the schema definition
Strategies of Learning System Software Functions• For each type of system software, distinguishing among: – Fundamental common features – Close machine-dependent features – Other common machine-independent features – Major design options for structuring a particular piece of software (ex. Single-pass versus multi-pass processing) – Unusual machine-dependent features (examples of implementations on actual machines)
Chapter1:Compilers • Compiler is a language translator. It is a program that translates programs written in a source language into an equivalent program in a target language. • The source language is usually a high-level programming language and the target language is usually the machine language of an actual computer.Implications: Source Target Compiler program Program-recognize legal (and illegal) programs-generate correct code Error messages-manage storage of all variables and code Diverse & Varied- agreement on format for object (orassembly) code
CompilersWhat qualities are important in a compiler? – 1. Correct code – 2. Output runs fast – 3. Compiler runs fast – 4. Compile time proportional to program size – 5. Support for separate compilation – 6. Good diagnostics for syntax errors – 7. Works well with the debugger – 8. Good diagnostics for flow anomalies – 9. Cross language calls – 10. Consistent, predictable optimization
Complier DesignAt the highest level of abstraction,compilers are often partitionedinto - a front end that deals only with language- specific issues, and - a back end that deals only with machine-specific issues.
The Many Phases of a Compiler Source Program- The typical compiler consistsof several phases each of which 1 Lexical Analyzerpasses its output to the nextphase. It uses Analysis- 2 Syntax AnalyzerSynthesis Model : 3 - Analysis: convert Semantic Analyzer source code into Symbol-table Error Handler discrete, manageable Manager 4 Intermediate “chunks”. Strings Code Generator tokens -trees 5 - Synthesis: Convert each chunk Code Optimizerinto a piece of target code.Trees-Intermediate code target 6code. Code GeneratorPhase 1, 2, 3 : AnalysisPhase 4, 5, 6 : Synthesis Target Program
The role of each compiler phase: Scanner• The lexical phase (scanner) groups characters into lexical units or tokens (Keyword, identifier, number,..etc.) – The input to the lexical phase is a character stream. The output is a stream of tokens. – Regular expressions are used to define the tokens recognized by a scanner (e.g. digit -> 0|1|..|9 and letter -> [A..Za-z], and identifier -> letter (letter|digit)*. – The scanner can be implemented as a finite state machine. Example: Position := initial + rate * 60 ; _______ __ _____ _ ___ _ __ _ All are tokens Blanks, Line breaks, etc. are scanned out
The role of each compiler phase: Parser• The parser recognizing whether a program (or sentence) is grammatically well formed. It groups tokens into syntactical units. – The output of the parser is a parse tree representation of the program. – Context-free grammars are used to define the program structure recognized by a parser. assignment statement := identifier expression + position expression expression * identifier expression expression initial identifier number rate 60Nodes of tree are constructed using a grammar for the language
What is a Grammar?• Grammar is a Set of Rules Which Govern the Interdependencies & Structure Among the Tokens statement is an assignment statement, or while statement, or if statement, or ... assignment statement is an identifier := expression ; expression is an (expression), or expression + expression, or expression * expression, or number, or identifier, or ...
The role of each compiler phase: Semantic• The semantic analysis phase analyzes the parse tree for context-sensitive information often called the static semantics.• Type Checking - Legality of Operands • Real := int + char ; • A[int] := A[real] + int ; • while char <> int do• The output of the semantic analysis phase is an annotated parse tree (augmented with semantic actions). := := position + position + initial * initial * rate 60 rate inttoreal 60 Compressed Tree Conversion Action
Symbol Table/Error Handling• Symbol Table Creation / Maintenance – Contains Info on Each “Meaningful” Token, Typically Identifiers – Data Structure Created / Initialized During Lexical Analysis – Utilized / Updated During Later Analysis & Synthesis• Error Handling – Detection of Different Errors Which Correspond to All Phases – What Kinds of Errors Are Found During the Analysis Phase or Synthesis Phase? – What Happens When an Error Is Found?
The role of each compiler phase: Intermediate Code Generation– It uses Abstract Machine Version of Code - Independent of Architecture– Easy to Produce and Do Final, Machine Dependent Code Generation– Three-Address Code: “Portable” assembly-like language – Every memory location can act like a register – At most three operands per instruction temp1 := inttoreal(60) temp2 := id3 * temp1 temp3 := id2 + temp2 id1 := temp3
The role of each compiler phase: Code Optimization & Code GeneratorOptimizer: • Find More Efficient Ways to Execute Code • Replace Code With More Optimal StatementsCode Generator: • Find More Efficient Ways to Execute Code • Replace Code With More Optimal Statements code optimizer temp1 := id3 * 60.0 id1 := id2 + temp1 final code generator movf id3, r2 mulf #60.0, r2 movf id2, r1 addf r2, r2 movf r1, id1
Compiler Passes• number of Passes – Single - read input file, write output file. It is Preferred – Multiple - Each pass may consist of several phases. It is Easier than single, but less efficient because it needs more memory
Chapter2 : Assemblers• Concepts to be learned – Assembler directives, forward references, two-pass assembly – Opcode table and symbol table – Two-pass assembly process and location counter – Program-counter relative and base relative addressing – Program relocation and modification records – Literals, literal pool, and literal table – Program blocks and multiple location counters – Control sections and independent assembly/compilation – External references and external definitions – One-pass and multi-pass assemblers
Real Machines• Machine architecture differs in: • Machine code • Instruction formats • Addressing mode • Registers• Complex Instruction Set Computers (CISC) – Relative large and complicated instruction set, more instruction formats, instruction lengths, and addressing modes – Hardware implementation is complex – Examples: VAX and Intel x86• Reduced Instruction Set Computers (RISC) – Simplified design, faster and less expensive processor development, greater reliability, faster instruction execution times – Examples: Sun SPARC and PowerPC
Simplified Instructional Computer (SIC) Architecture• Why the simplified instructional computer – A hypothetical computer designed to include common hardware features while avoiding irrelevant complexities – Separate the central concepts of system software from the implementation details associated with a particular machine – A good starting point to begin the design of system software for a new or unfamiliar computer. – Two versions (upward compatible) • Standard model • XE version (extra equipment)
SIC Machine Architecture• Memory – 8-bit bytes; 3 bytes word (24 bits); byte addresses; total of 32,768 (215) bytes in the memory A word (3 bytes, or 24 bits) …• Registers - 5 registers, 24 bits in length 32768 = 215 bytes A 0 Accumulator; used for arithmetic operations X 1 Index register; used for addressing (offset) L 2 Linkage register; the Jump to Subroutine (JSUB) instruction stores the return address here PC 8 Program counter; contains the address of the next instruction to be fetched for exaction SW 9 Status word; contains a variety of information, including a Condition Code (CC)
SIC Machine Architecture (continue)• Data formats – Integers are stored as 24-bit binary numbers – Characters are stored using 8-bit ASCII codes – No floating point hardware• Instruction formats 8 1 15 opcode x address – Opcodes all end with 00 – Flag bit x indicates indexed-addressing mode ( ): Contents of a register or a memory location
SIC Machine Architecture (continue)• Instruction set – Load and store registers • LDA, LDX, STA, STX – Integer arithmetic operations (involve register A and a word in memory, save result in A) • ADD, SUB, MUL, DIV – Comparison (involves register A and a word in memory, save result in the condition code (CC) of SW) • COMP – Conditional jump instructions (according to CC) • JLE, JEQ, JGT – Subroutine linkage • JSUB (jumps and places the return address in register L)
SIC Machine Architecture (continue)• Input and output – Transfer 1 byte at a time to or from the rightmost 8 bits of register A – Each device is assigned a unique 8-bit code – 3 I/O instructions: Test Device (TD), Read Data (RD) and Write Data (WD) • Test device (TD): – Test whether the device is ready to send/receive – Test result is set in CC • Read data (RD): read one byte from the device to register A • Write data (WD): write one byte from register A to the device – Repeat for each byte, time consuming
SIC/XE Machine Architecture• Increased memory -Total of 1 megabyte (220 bytes)• Additional registers B 3 Base register; used for addressing S 4 General working register T 5 General working register F 6 Floating-point accumulator (48 bits)• Additional data formats 48-bit floating-point data type 1 11 15 s exponent fraction
SIC/XE Machine Architecture (continue)• Varied instruction formats Format 1 (1 byte) Format 2 (2 bytes) 8 8 4 4 op op r1 r2 Format 3 (3 bytes) 6 1 1 1 1 1 1 12 op n i x b p e disp Flag bits n,i,x,b,p indicate addressing Format 4 (4 bytes) modes 6 1 1 1 1 1 1 20 op n i x b p e address
SIC/XE Machine Architecture (continue)Addressing Flag values TAmodeDirect i = 1, n = 1 = disp; format 3 = address; format 4Relative Base relative b = 1, p = 0 = (B) + disp PC relative b = 0, p = 1 = (PC) + disp Only with format 3Immediate i = 1, n = o TA = operand valueindirect i = o, n = 1 = (disp); format 3 = (address); format 4simple i = 0, n = 0; b,p,e are part = disp[15bits]; Upward of the address field compatible with SIC
SIC/XE Machine Architecture (continue)• Additional instruction set – Load and store new registers (LDB, STB, etc.) – Floating-point arithmetic operations (ADDF, SUBF, MULF, DIVF) – Register move (RMO) – Register-to-register arithmetic operations (ADDR, SUBR, MULR, DIVR) – Special supervisor call instruction (SVC) – generating interrupt to communicate with the OS• Additional input and output feature – Provide I/O channels, overlapping computing with I/O – Instructions SIO, TOP, and HIO are used to start, test, and halt the operation of I/O channels
SIC/XE Instruction Set X: only for XE C: set CC F: floating-point P: privileged
Mnemonic opcode SIC Programming Examples operands (1) Data Movement commentsSIC Assembler directives for defining storage Address labelsSIC/XE Immediate addressing makes program faster due to fewer memory reference
SIC Programming Examples (3) Looping and Indexing: part ISIC Copy one 11-byte string to another
SIC Programming Examples (4) Looping and Indexing: part IISIC GAMMA [ ] ALPHA [ ] + BETA [ ]
Basic Assembler Functions• Assembler handles mnemonic operation codes, constants, literals, directives and addressing modes• Simple assembler and the assembly process (Role of Assembler) – Convert mnemonic operation codes to their machine language equivalents – Convert symbolic operands to their equivalent machine addresses – Build the machine instructions in the proper format – Convert the data constants specified in the source program into their internal machine representations – Write the object program and the assembly listing
Basic Assembler Functions (continue)• Assembler directives (Fig.2.1, page 45) – START Specify name and starting address for the program – END Indicate the end of the source program and (optionally) specify the first executable instruction in the program – BYTE Generate character or hexadecimal constant, occupying as many bytes as needed to represent the constant – WORD Generate one-word integer constant – RESB Reserve the indicated number of bytes for a data area – RESW Reserve the indicated number of words for a data area• Process assembler directives – No need to be translated into machine instructions because they provide instructions to the assembler
Assembly Program with Object Code Forward reference
Basic Assembler Functions (continue)Forward references (Fig.2.1, page 45) • Definition – A reference to a label that is defined later in the program – Line by line translation is problematic • Solution – Two passes • First pass: scan the source program for label definitions and assign addresses assignment • Second pass: perform most of the actual instruction translation
Basic Assembler Functions (continue) Loc Source Statement Object Code • Functions of the two passes ___ _________________ _______ 1000 COPY START 1000pass 1: (Define symbols) loop until the 1000 FIRST STL RETA 141033end of the program 1003 LOOP JSUB RD 482039 1. read in a line of assembly code 1006 LDA LEN 001036 1009 COMP ZERO 281030 2. assign an address to this line 100C JEQ ENDF 301015 increment N (word addressing or 100F JSUB WR 482061 byte addressing) 1012 J LOOP 3C1003 1015 ENDF LDA 3. save address values assigned to … labels in symbol tables 102A EOF BYTE C’EOF” 000000 4. process assembler directives … constant declaration 1033 RETA RESW 1 000000 space reservation 1036 LEN RESW 1pass2: (assemble instructions and … 2039 RD LDX ZERO 041030generate object program) same loop … 1. read in a line of code 2061 WR LDX ZERO 041030 2. validate and translate op code … using op code table 3. change labels to address HXCOPY X001000X00107A using the symbol table TX001000X1EX141033X482039X001036 …. 4. process assembler directives … 5. produce object program EX001000
Basic Assembler Functions (continue) • Two-pass assembly structure OPTABSource program Pass 1 Intermediate file Pass 2 Object program LOCCTR SYMTAB Intermediate file contains Source statement with : assigned address Error indicators Pointers to OPTAB and SYMTAB Etc.
Basic Assembler Functions (continue)• Output object program - assembler must writes object code to some output device for later execution• Simple object program format (Fig.2.3, page 49) – Header record contains program name, starting address, length – Text record contains machine code (translated instructions and data) with an indication of the addresses where these are to be loaded – End record marks the end of object code program (see textbook pp.49 for details)
Object Program Format• Header Col. 1 H Col. 2~7 Program name Col. 8~13 Starting address of object program (hex) Col. 14-19 Length of object program in bytes (hex)• Text Col.1 T Col.2~7 Starting address for object code in this record (hex) Col. 8~9 Length of object code in this record in bytes (hex) Col. 10~69 Object code, represented in hex (2 col. per byte)• End Col.1 E Col.2~7 Address of first executable instruction in object program (hex) 1033-2038: Storage reserved by the loader
Basic Assembler Functions (continue)• Assembler data structure – Operation Code Table (OPTAB) Mnemonic code Machine code – Symbol Table (SYMTAB) Label Address - LOCCTR A variable accumulated for address assignment, i.e., LOCCTR gives the address of the associated label.• Assembler algorithm – See Fig.2.4, practice with example in Fig.2.1, Fig.2.2.
Data Structures for AssemblerOperation Code Table• Contents: – Mnemonic operation codes – Machine language equivalents – Instruction format and length• During pass 1: – Validate operation codes – Find the instruction length to increase LOCCTR• During pass 2: – Determine the instruction format – Translate the operation codes to their machine language equivalents • key: mnemonic code • result: bits• Implementation: a static hash table is usually used • once prepared, the table is not changed • efficient lookup is desired • since mnemonic code is predefined, the hashing function can be tuned a priori
Data Structures for Assembler (cont’d)Symbol table• Contents: – Label name – Label address – Flags (to indicate error conditions) – Data type or length• During pass 1: – Store label name and assigned address (from LOCCTR) in SYMTAB • efficient insertion and retrieval is needed • deletion does not occur• During pass 2: – Symbols used as operands are looked up in SYMTAB• Implementation: – a dynamic hash table for efficient insertion and retrieval – Should perform well with non-random keys (LOOP1, LOOP2, X1, X2).• problem
Why Program Relocation• To increase the productivity of the machine• Want to load and run several programs at the same time (multiprogramming)• Must be able to load programs into memory wherever there is room• Actual starting address of the program is not known until load time
Absolute Program• Program with starting address specified at assembly time• In the example of SIC assembly program (Fig. 2.2) started at 1000 (COPY START 1000). The following statement means Load register A from memory address 1036Instruction: 55 101B LDA THREE 001036 Calculated from the starting address 1000Instruction: 100 1036 THREE RESW 1• The address may be invalid if the program is loaded into some where else.
What Needs to be Relocated• Need to be modified: – The address portion of those instructions that use absolute (direct) addresses.• Need not be modified: – Register-to-register instructions (no memory references) – PC or base-relative addressing (relative displacement remains the same regardless of different starting addresses)
How to Relocate Addresses• For Assembler – For an address label, its address is assigned relative to the start of the program (that’s why START 0) – provides loader with information about • which address needs fixing • length of address field – Produce a modification record to store the starting location and the length of the address field to be modified.• For loader – For each modification record, add the actual beginning address of the program to the address field at load time.
Format of Modification Record• One modification record for each address to be modified• The length is stored in half-bytes (20 bits = 5 half-bytes)• The starting location is the location of the byte containing the leftmost bits of the address field to be modified.• If the field contains an odd number of half-bytes, the starting location begins in the middle of the first byte.
Machine-Dependent Assembler Features• Use register-to-register instructions whenever possible – Take advantage of additional registers – Reduce instructions length; – avoid memory reference; speed up• Use immediate addressing as much as possible – Avoid memory reference – Can be combined with relative addressing• Use indirect addressing as much as possible – Avoids the need for another instruction – Can be combined with relative addressing
Machine-Dependent Assembler Features• Most register-to-memory instructions are assembled using relative addressing – Reduce instruction length – simplify program relocation – The displacement should not overflow 12bits, otherwise use format 4; – using PC relative or Base relative is arbitrary, programmer’s choice
Machine-Dependent Assembler FeaturesExtended features reflected in code (Fig 2.5)• Prefix denotations @ - indirect addressing # - immediate addressing + instruction format 4 is used, no displacement• Additional assembly directives – BASE: Base-Relative addressing mode used – NOBASE: cancel Base-Relative addressing• Additional instructions – COMPR: compare values in two registers (format 2)
Machine-Dependent Assembler Features• Program Relocation (Fig.2.6, 2.7, 2.8) – Multiprogramming; shared memory – Load-time binding – Relocatable program instead of absolute program – Assembler generates relative address (assume the program starts at 0) – Object program includes modification record Col.1 M Col.2-7 Starting location of the address field to be modified, relative to the beginning of the program Col.8-9 Length of the address field to be modified, in 1/2bytes
Machine-Dependent Assembler Features• Address modification – Add the beginning address to the address field of an instruction – Instructions need to be modified at load time • Specific direct addresses • For SIC/XE, only in format 4 – Instructions need not to be modified at load time • Operand is not memory address • PC relative and base relative addressing is used • Immediate + relative addressing is used
Machine-Independent Assembler Features• Literals (Fig.2.9,2.10) – The value of a constant operand directly stated in the instruction – Label and BYTE statement are avoided – Same effect as using BYTE statement, same object code – Prefix notation: =, followed by a specification of the literal value – Example: 45 001A ENDFIL LDA =C’EOF’ 032010 215 1062 WLOOP TD =X’05’ E32011 Literal The assembler generates the specified value as a constant at some other memory location Immediate Operand value is assembled as part of the operand machine instruction
Machine-Independent Assembler Features• Literal pool (Fig.2.9, 2.10) – At the end of the program – At certain locations in the program • use directive LTORG • containing all the literal operands used since previous LTORG • Keep the literal operand close to the instruction that uses it • Enable relative addressing, avoid using instruction format 4
Machine-Independent Assembler Features• Duplicate literals – The same literal used in more than one place – Store only one copy of the specified data value – The literals =C’EOF’ and =X’454F46’ have identical operand values – Problems: same literal name, different values • Literals whose value depends upon their location in the program • when a literal refers to any item whose value changes (location counter)
Machine-Independent Assembler Features• Assembly process for literals Literal table (LITTAB) Literal Operand value Address name and length assigned Pass 1 Literal operand recognized Pass 2 Search LITTAB Literal operand encountered Add literal to LITTAB if it is not Search LITTAB to obtain present operand address Encounter a LTORG or end of Insert data values of literals program into appropriate places in the object program Scan LITTAB, assign address to each literal
Machine-Independent Assembler Features• Symbol–defining statements (Fig 2.9, 2.10) – EQU statement symbol EQU value • Directly assign values to symbols • Insert symbols into SYMTAB – ORG statement ORG value • Indirectly assign values to symbols • reset LOCCTR to the specified value • Affect the values of all labels defined until the next ORG • Useful when defining the internal structure of the symbol table Restrictions – Restrictions: values should be constant or expression involving constants and previously defined symbols
Machine-Independent Assembler Features• EQU statement examples MAXLEN EQU 4096 Use symbol instead of numeric +LDT #MAXLEN values, improve readability, easy to find and change values A EQU 0 Define mnemonic names for registers, X EQU 1 some instruction may require register L EQU 2 numbers instead of names (RMO) BASE EQU R1 Define general-purpose registers as COUNT EQU R2 special registers INDEX EQU R3
Machine-Independent Assembler Features• ORG statement Examples SYMBOL VALUE FLAGS STAB (100 entries) 6-byte 3-byte 2-byteSTAB RESB 1100 ;reserve space for the symbol table ORG STAB ;reset LOCCTR to the value of STABSYMBOL RESB 6 ;assign to SYMBOL the address STABVALUE RESW 1 ;assign to VALUE the address STAB+6FLAGS RESB 2 ;assign to FLAGS the address STAB+9 ORG STAB+1100 ;set LOCCTR to its previous value
Machine-Independent Assembler Features• Expressions – Absolute expressions • A expression contains only absolute terms • Or a expression contains relative terms which occur in pairs and the terms in each such pair have opposite signs – Relative expressions • A expression in which all the relative terms except one can be paired as described above; the remaining unpaired relative term must have a positive sign. – Error expressions • Neither absolute nor relative expressions • Should be flagged by the assembler as errors
Machine-Independent Assembler Features• Expression terms – Constant (absolute term) – Label (relative term) – Symbol defined by EQU (absolute or relative term, depending on the expression used to define its value) – Special term * used to refer to LOCCTR (relative)• Type flag in SYMTAB Symbol Type Value RETADR R 0030 BUFFER R 0036 BUFFER R 1036 MAXLEN A 1000
Machine-Independent Assembler Features• Expression rules – Legal expressions are those whose value remains meaningful when program is relocated; – None of the relative terms may enter into a multiplication or division operation• Expression example (Fig.2.9, 2,10) 107 MAXLEN EQU BUFEND-BUFFER – Illegal expressions BUFEND + BUFFER 100 – BUFFER 3 * BUFFER
Machine-Independent Assembler Features• Program Blocks (Fig.2.11, 2.12) – Segments of code that are rearranged within a single object program unit – Each program block may contain several separate segments of the source program – The assembler provides reorganization
Machine-Independent Assembler Features• Benefits of using program blocks – Move large buffer area to the end of the object program, avoid using format 4 – Base register is avoided – Place literals ahead of any large data areas – Separate source program order from object program order
Machine-Independent Assembler Features• Assembler directive USE – indicates which portions of the source program belong to the various blocks – Example: 92 USE CDATA ;begin block named CDATA 103 USE CBLKS ;begin block named CBLKS 183 USE ;resume the default block
Machine-Independent Assembler Features• Assembler handling for program blocks (Fig.2.12) Pass 1Separate LOCCTR for each block, initialized to 0Save and restore LOCCTR values when switching between two blocksEach label is assigned an address relative to the start of the block thatcontains it, and label address is stored with block number in SYMTABConstructs a table that contains the starting addresses and lengths for allblocks Pass 2Generate address for each symbol relative to the start of the object programAccess SYMTAB, and add the location of the symbol to the block startingaddress
Machine-Independent Assembler Features• Block table (Fig.2.12) Block name Block number address Length (default) 0 0000 0066 CDATA 1 0066 000B CBLKS 2 0071 1000• SYMTAB Symbol Address Bock number LENGTH 0003 1• Object program respecting program blocks (see Fig.2.13, 2.14)
Machine-Independent Assembler Features• Control sections (Fig.2.15, 2.16) – Segments that are translated into independent object program units. – Each control section can be loaded and relocated independently of others. – Programmer can assemble and manipulate each control section separately – Mostly used for subroutines or other logical subdivisions of a program• Assembler directive CSECT – signals the start of a new control section
Machine-Independent Assembler Features• External reference (Fig.2.15, 2.16) – References between control sections – Assembler has no idea about other control sections’ location at execution time – Assembly directive EXTDEF (external definition) and EXTREF EXTDEF Define external symbols that may be used by other sections EXTREF Names symbols that are used in this control section and are defined elsewhere – Control section names are automatically external symbols
Machine-Independent Assembler Features• Assembler handling for control sections – Separate LOCCTR for each section, initialized to 0 – Inserts an address of zero to external reference 15 0003 CLOOP +JSUB RDREC 4B100000 160 0017 +STCH BUFFER,X 57900000 190 0028 MAXLEN WORD BUFEND-BUFFER 000000 – Format 4 has to be used for external reference (relative addressing is not possible ) – Assembler must remember (via entries in SYMTAB) in which control section a symbol is defined – References to unidentified external symbol are flagged as an error. – Same symbol name can be used in different sections
Machine-Independent Assembler Features• Object program respecting control sections (Fig.2.17) – Define record – Refer record – Modification record (See pp.89 for details)
Assembler Design Options• One-pass Assemblers – Must solve the problem of forward references – Defined data items before they are referenced – Special handling of symbols• Two types of one-pass assemblers 1. Produces object code directly in memory for immediate execution (load-and-go) 2. Produces object program for later execution
Assembler Design Options• Load-and-go assembler features – Produce object code directly in memory – Load and go, no loader is needed – Efficient assembly process, good for program development and testing – Generate absolute code at assembly time
Assembler Design Options• Load-and-go assembler handling for symbols (Fig. 2.18, 2.19) – Encounter a symbol operand that hasn’t been defined – Omit the operand address – Enter the symbol into SYMTAB, flag it as undefined – Add the address of this symbol operand to a list of forward references associated with the SYMTAB entry – Encounter the definition for a symbol – Scan the forward reference list for this symbol – Insert the proper symbol address into the listed address
Assembler Design Options• Features of the one-pass assembler that output object programs – Produce object programs as output – Used on system where external storage is slow (eliminating intermediate file) – Generate extra text record in object program to handle forward references – Insert addresses for forward references during loading time
Assembler Design Options• One-pass assembler handling for symbols (output object programs) (Fig. 2.20) – Encounter a symbol operand that hasn’t been defined – Generate the operand address as 0000 – Enter the symbol into SYMTAB, flag it as undefined – Add the address of this symbol operand to a list of forward references associated with the SYMTAB entry – Encounter the definition for a symbol – Scan the forward reference list for this symbol – Generate Text record to insert the proper operand address into the listed address
Assembler Design Options• Multi-pass assemblers (Fig.2.21) – Eliminate the prohibition of forward references in symbol definition – Make as many passes as are needed to process the definition of symbols – Assembler still pass the entire program for twice – In pass1, additional passes only scan the stored symbol definitions that involve forward reference – Finally, a normal pass2 is made
Implementation Examples• MASM Assembler• SPARC Assembler(No lecture, students must read by themselves)
Loaders and Linkers• Concepts to be learned – Absolute loader, relocatable loader, linking loader, bootstrap loader – Independent assembly/compilation and program linking – Static and dynamic program libraries – Linage editors and linking loaders – Bootstrap loaders and program loaders
Chapter Overview Loaders and Linkers• Some concepts and definitions – Loading, which brings the object program into memory for execution – Relocation, which modifies the object program so that it can be loaded at an address different from the location originally specified – Linking, which combines two or more separate object programs and supplies the information needed to allow references between them – Loader, a system program that performs the loading function, and may also support relocation and linking
Basic Loader Functions• Design of an absolute loader (Fig.3.1, 3.2) A single pass Check Header record to verify the program name and size Jump to the starting address Read each Text record and move object code to the memory Read End record and jump to the specified address Execute the program
Basic Loader Functions• Representation of object program – Hexadecimal representation in character form (waist memory space and execution time) Characters in ‘0’ ‘1’ ‘2’ ‘3’ ‘4’ ‘5’ ‘6’ ‘7’ ‘8’ ‘9’ ‘A’ ‘B’ ‘C’ ‘D’ ‘E’ ‘F’ object program ASCII code (hex) 30 31 32 33 34 35 36 37 38 39 41 42 43 44 45 46 Internal value 0 1 2 3 4 5 6 7 8 9 A B C D E F (hex) Difference (hex) 30 30 30 30 30 30 30 30 30 30 37 37 37 37 37 37 – In binary form (save memory space and execution time, but low readability)
Basic Loader Functions• A simple bootstrap loader (Fig. 3.3) – The loader program begins at address 0 in the memory – Load the first program to be run by the computer (OS), – Load the object code into consecutive bytes of memory, starting at address 80 – Simplified object program (contains only object code, no Head record, End record, or control information) – Object code is represented as hexadecimal digits in character form – Loader must convert ASCII character code to the value of the hexadecimal digit that is represented by that character.
Machine-dependent loader Features• Program relocation and relocation loaders (or relative loaders) – For SIC/XE, processing modification record (Fig. 3.4, 3.5) – For SIC, processing relocation bit mask (Fig. 3.6, 3.7) • Modification record is no longer suitable for SIC, since there is no relative addressing and immediate addressing, almost all the addresses need to be modified. • One modification bit is assigned to each instruction. • The relocation bits are gathered together into a bit mask (3 hexadecimal digits) following the length indicator in each Text record.
Machine-dependent loader Features• Program linking and linking loaders (Fig.3.8,3.9,3.10) – Loader processes Define record, Refer record, and Modification record – The assembler will evaluate as much of the expression as it can – The remaining terms are passed on to the loader via Modification records
Machine-dependent loader Features• Linking loader data structures (Fig.3.11) – External symbol table (EATAB) Control section Symbol name Address Length – Program load address (PROGADDR) • The beginning address in memory where the linked program is to be loaded • OS supplies the value of PROGADDR – Control section address (CSADDR) • The starting address assigned to the control section currently being scanned by the loader • Loader uses this value to convert relative addresses to actual addresses within the control section – Control section length (CSLTH) – Execution address (EXECADDR)
Machine-dependent loader Features• linking loader Algorithms (Fig.3.11) – Pass 1 • Process only Header and Define record in the object program • Construct ESTAB • Assign address to each control section • Assign addresses to external symbols – Pass 2 • Process Text and Modification record in the object program • perform the actual loading, relocation, and linking • For each Text record, move object code to the specified address (plus the current value of CSADDR) • For each Modification record, look up ESTAB for the specified symbol value, add it to or subtract it from the specified address (plus the current value of CSADDR)
Machine-dependent loader Features• Transfer address (Fig.3.11) – Loader performs the transferring of control to the loaded program to begin execution • Normally, a transfer address is placed in the End record for a main program, not for a subroutine • If more than one control section specifies a transfer address, the loader arbitrarily uses the last one encountered. • If no control section contains a transfer address, the loader uses the beginning of the linked program (PROGADDR) as the transfer point – Alternatively, user can enter a separate Execute command to specify the transfer address (some systems)
Machine-dependent loader Features• Reference number (Fig.3.12) – In Refer record, assign reference number to each external symbol – In Modification record, reference numbers are used instead of symbol names – 01 is usually assign to the control section name – Avoid multiple searches of ESTAB for the same symbol during the loading of a control section – Obtains the values for code modification by simply indexing into an array of these values
Machine-Independent Loader Features• Using program Libraries – Assembled or compiled versions of the subroutines (object programs) in organized structure – Allow programmer to use subroutines from one or more libraries as part of the programming language – Library subroutines are automatically fetched, linked with the main program and loaded. – Standard system library (automatically incorporated) • I/O library, math library, graphics libraries, etc. – Other libraries (specified by parameters to the loader) • C library, Java library (JDK), etc.
Machine-Independent Loader Features• program Libraries – organized collection of object programs Source program Object program Program library … … Index REREC REREC … … … REREC WRREC WRREC … … … WRREC Assembly/ … organize compile … … Index ERR- ERR- … HANDLER HANDLER ERR- … … HANDLER …
Machine-Independent Loader Features• Automatic library search – Keep track of the external symbols that are referred to – In pass1, enter symbols from each Refer record into ESTAB, marked undefined – When the definition is encountered, the address assigned to the symbol is filled in to complete the entry – At the end of Pass1, the symbol in ESTAB that remain undefined represent unresolved external references – The loader searches the library or libraries specified for routines that contain the definition of these symbols – Loader processes the found library subroutines, which may contain further external reference – repeat the library search process until all the external references are resolved
Machine-Independent Loader Features• Static Libraries – Functions from static libraries are linked/loaded before execution time Copy … Copy … JSUB RDREC linking … JSUB RDREC Index loader … If error JSUB … ERRHANDLER If error JSUB RDREC … ERRHANDLER … … WRREC JSUB WRREC … … JSUB WRREC … JSUB WRREC Index … JSUB WRREC … RDREC … ERR- … HANDLER WRREC … … ERRHANDLER …
Machine-Independent Loader Features• Overriding – Programmer supplies his or her own routines instead of library routines by using the same routine names – Programmer defined routines are included as input to the loader – By the end of Pass1, ESTAB already contains a complete entry for each of the programmer defined routines. – Library search for those routines is avoided.
Machine-Independent Loader Features• Library directory – It is not efficient to search libraries by scanning the Define records for all of the object programs on the library – Perform library search on the directory, which is constructed as part of the library – Directory gives the name of each routine and a pointer to its address within the file – If a subroutine is to be callable by more than one name, both names are entered into the directory, but only one copy of the object program is stored – Directory for commonly used libraries may be kept in memory permanently
Machine-Independent Loader Features• Loader options and commands – Selection of alternative sources of input INCLUDE program-name(library-name) – Delete external symbols or entire control sections DELETE symbol-name or DELETE csect-name – Change external symbol name1 to name2 whenever it appears in the object program CHANGE name1,name2 – Specify alternative libraries to be searched before standard system libraries LIBRARY MYLIB – Leave external references unresolved NOCALL symbol-name
Machine-Independent Loader Features• Other loader options – Specify that no external references be resolved by library search – Specify the options of outputting a load map – Specify the location at which execution is to begin (overriding any information given in the object programs) – Control whether or not the loader should attempt to execute the program if errors are detected during the load
Machine-Independent Loader Features• Loader options examples – Use library routines READ and WRITE instead of programmer defined routine RDREC and WRREC, without reassembling the program INCLUDE READ(UTLIB) INCLUDE WRITE(UTLIB) DELETE RDREC, WRREC CHANGE RDREC, READ CHANGE WRREC, WRITE – It is known that in a particular execution, routines STDDEV, PLOT, and CORREL will not be called. The following command can instruct the loader to leave those external references unresolved (avoid the overhead of loading and linking, save memory space) NOCALL STDDEV, PLOT, CORREL
Loader Design Options• Linkage Editors (Fig.3.13) – Produces a linked version of the program (load module) before loading – Performs relocation of all control sections relative to the start of the linked program – Simplify the loading process • Only a simply relocating loader is needed • Less modification is needed • All the items that need to be modified have values that are relative to the start of the linked program • No EXTAB is required • A single pass is enough
Loader Design Options• Linkage Editors options and commands – Perform relocation if the starting address of a program is known – Replace subroutine in the linked version of a program – Build packages of subroutines or other control sections that are generally used together – Allow external references unresolved
Loader Design Options• Dynamic Linking (dynamic loading, load on call) – Postpones the linking function until execution time – A subroutine is loaded and linked to the rest of the program when it is first called• Advantages of dynamic linking – Allow several executing programs to share one copy of a subroutine or library – Rarely called subroutines don’t need to be load into the memory every time the program is run – Allow modification to a subroutine without changing the programs in which it is called
Machine-Independent Loader Features• Dynamic libraries – Functions from dynamic libraries are loaded and linked during execution time WRREC …Copy Index … COPY … … RDRECJSUB RDREC JSUB RDREC … … … Static If error WRREC If error JSUB linking JSUB ERRHANDLER Dynamic …ERRHANDLER loader … linking … JSUB WRREC loader … IndexJSUB WRREC JSUB SRREC … … … ERR-JSUB WRREC HANDLER … ERRHANDLER … … RDREC …
Loader Design Options• Dynamic Linking process (Fig.3.14) – A subroutine is called – Send load-and-call service request to the OS – OS examine the internal table – If the subroutine is not loaded, then load it from the library – OS passes control to the routine being called – Subroutine completes its processing, and returns control to the OS – OS does some processing if necessary (release the memory or not) – OS returns control to the program that issue the request – This is called execution time binding, which gives more capabilities at a higher cost (OS intervention)
Loader Design Options• Dynamic loading of libraries – Entire library is loaded, used, and unloaded during execution time under program control RDREC (entire library) … Copy WRREC … … Load libraryGet function addr. Static Copy JSUB RDREC linking … Dynamic … loader Load library loaderGet function addr. Get function addr. JSUB WRREC JSUB RDREC … … Index Get function addr. … JSUB WRREC … JSUB WRREC RDREC … … Close library JSUB WRREC WRREC … … Close library
Loader Design Options• Bootstrap loaders – Solution 1 • An absolute loader program is permanently resident in ROM • Start execution when some hardware signal occurs • Executed directly in the ROM or copied to main memory and executed there • Load the OS or any other stand-alone programs – Solution 2 • A bootstrap loader is added to the beginning of all object programs that are to be loaded into an empty and idle system • Built-in hardware function reads the first record of the loader into memory at a fixed location • If necessary, this record will read more records until the whole bootstrap loader is loaded into the memory • The bootstrap loader loads the absolute program that follows