Assembler

3,921 views

Published on

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,921
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
176
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Assembler

  1. 1. Assembler 1
  2. 2. System Software• components – translator • assembler • compiler • interpreter – system manager • operating system – other utilities • loader • linker • DBMS, editor, debugger, ...• purpose of this course – understand how to build system software – understand how these components work 2
  3. 3. Issues in System Software• not many in this area – mature area• advanced architectures complicates system software – superscalar CPU – memory model – multiprocessor• new applications – embedded systems – mobile/ubiquitous computing 3
  4. 4. Assembler Overview• functions – translate programs written in assembly language to machine code • mnemonic code to machine code • symbols to addresses – handles • constants • literals • addressing• 32 bit constant or address• 32 bit offset 4
  5. 5. Assembler Overview (cont’d)• pass 1: loop until the end of the program 1. read in a line of assembly code 2. assign an address to this line • increment N (word addressing or byte addressing) 3. save address values assigned to labels • in symbol tables 4. process assembler directives • constant declaration • space reservation• pass2: same loop 1. read in a line of code 2. translate op code using op code table 3. change labels to address using the symbol table 4. process assembler directives 5. produce object program 5
  6. 6. Data Structures for Assembler add $t0, $t1, $t2 000000 01001 01010 01000 00000 100000• op code table – looked up for the translation of mnemonic code • key: mnemonic code • result: bits – hashing is usually used • once prepared, the table is not changed • efficient lookup is desired • since mnemonic code is predefined, the hashing function can be tuned a priori – the table may have the instruction format and length • to decide where to put op code bits, operands bits, offset bits • for variable instruction size • used to calculate the address 6
  7. 7. Data Structures for Assembler (cont’d) .text .globl main• symbol table main: la $t0, array – stored and looked up to assign lw $t1, count address to labels lw $t2, ($t0) loop: • efficient insertion and retrieval lw $t3, 4($t0) is needed ble $t3, $t2, loop2 • deletion does not occur move $t2, $t3 – difficulties in hashing loop2: add $t1, $t1, -1 add $t0, $t0, 4 • non random keys bnez $t1, loop – problem … • the size varies widely …. .data array: .word 3, 5, 5, 1, 6, 7, ….. count: .word 15 string1: .asciiz “nmax = “ 7
  8. 8. Symbol Table Construction .text .globl main symbol name valuemain: main 0 la $t0, array lw $t1, count loop 12 lw $t2, ($t0)loop: loop2 24 lw $t3, 4($t0) ble $t3, $t2, loop2 … move $t2, $t3 array 408loop2: add $t1, $t1, -1 count 468 add $t0, $t0, 4 bnez $t1, loop string1 472 … bad 478 …. .dataarray: .word 3, 5, 5, 1, 6, 7, …..count: .word 15string1: .asciiz “nmax = “bad: .word 7 8
  9. 9. Assembler Algorithm: pass1begin if starting address is given LOCCTR = starting address; else LOCCTR = 0; while OPCODE != END do ;; or EOF begin read a line from the code if there is a label if this label is in SYMTAB, then error else insert (label, LOCCTR) into SYMTAB search OPTAB for the op code if found LOCCTR += N ;; N is the length of this instruction (4 for MIPS) else if this is an assembly directive update LOCCTR as directed else error write line to intermediate file end program size = LOCCTR - starting address;end 9
  10. 10. Assembler Algorithm: pass2begin read a line; if op code = START then ;; .globl xxx for MIPS write header record; while op code != END do ;; or EOF begin search OPTAB for the op code; if found if the operand is a symbol then replace it with an address using SYMTAB; assemble the object code; else if is a defined directive add $t0, $t1, $t2 => convert it to object code; 000000 01001 01010 01000 00000 100000 add object code to the text; read next line; end write End record to the text; output text;end 10
  11. 11. Program Relocation 0 . . . jump to 1004 1004 . . 1076 5000 . jump to 1004 . . jump to 1004 . 6076 program is loaded at 0 program is loaded at 5000• motivations for relocation – a program may consists of several pieces of codes that are assembled independently – when a program is assembled, it is impossible to know the exact location where the program starts 11
  12. 12. Program Relocation (cont’d)• distances from the origin of a program do not change – make the address relative to the origin – provides loader with information about • which address needs fixing • length of address field – the loader change those addresses as • distance + start address of a program – only absolute addresses need to be changed 12
  13. 13. Literals• usage – encoded as an operand (similar to the immediate in MIPS, but different) • load $7, =X’0A7F’ – simple way to declare a constant – assembler does • declare a constant with a label • use the label to use the value• comparison with immediate – literal is an assembler directive • immediate is a machine recognizable data – full word can be used for literals • immediate: full word – (opcode, registers) – values are obtained from data memory - slow • immediate data is within the instruction itself 13
  14. 14. Literals (cont’d)• literal pool – assembler collects all the literals into one or more literal pools – default location is at the end of the program • for better code reading – programmer can declare a place (LTORG) • to use PC-relative addressing • to keep data close to instruction• optimization – make one literal for the same value • compare character string or value? – x’454F46’ = c’EOF’ • value comparison needs evaluation• literal table – name(label), operand value, operand length, address in the table – name and value are all used as a key 14
  15. 15. Literal Handling Algorithmpass 1 at a recognition of a literal search LITTAB by name if found but different value, error else if the same value, no action else if not found insert a new literal (no address yet) if the code is LTORG or END allocate each literal assigning an addresspass 2 replace each literal with the address in the LITTAB if these addresses are absolute, prepare modification for relocation 15
  16. 16. Symbol Defining Statement• MAXLEN EQU 4096 – makes program structure better – easier to modify a single location – easier to remember than numbers – registers can be given meaningful names – (maxlen = 4096) in MIPS• assembler – searches SYMTAB and replace the symbol with the value in the table – resulting object code is the same as using the value instead of symbol – remember that with 2 passes there is restriction X EQU Y Y EQU 100 • X cannot be defined in pass 1 16
  17. 17. ExpressionsBUFFER: .space 4096 ; reserve 4096 bytes hereBUFEND: ; set current location to BUFFEND(MAXLEN = BUFEND – BUFFER) ; calculate the size of the buffer• allows simple arithmetic operations in symbol definition• operands may have relative values for relocation – relative values should be modified by the loader later • we need to know which is relative – symbol table needs a type field to discern absolute symbols from relative symbols 17
  18. 18. Expression Rules• basic – constant is absolute – address is relative• using expressions – expression with absolute arguments is absolute – expression that has multiplication and division is absolute – relative_1 - relative_2 is absolute • dependencies on starting address are canceled out – all the other expressions having relative terms are neither relative nor absolute (error?) • constant - relative • relative_1 + relative_2 • 3 x relative_1 18
  19. 19. Program Blockssource object codeblock 0 block 0block 1 assembledblock 2block 0 block 1block 1 block 2block 2 19
  20. 20. Program Blocks (cont’d)• motivation – programmer’s view may be different from machine’s view • affects only efficiency not functionality – addressing can be simplified • large data area can be moved to the end of code while source code places it close to the instructions that use this data• data structure and algorithm – block table (name, block number, address, length) – pass 1 • maintain separate LOCCTR for each block – each label is assigned address relative to the start of the block that contains it • SYMTAB stores block number for each symbol • store starting address of each block in block table – pass 2 • assign address to each symbol by adding the relative address to the block starting address 20
  21. 21. Control Sections• control section is a part of program that can be assembled independent of other parts – a large problem can be divided into many control sections – each control section can be developed independently – each control section can be modified independently• symbols defined in other control sections – called external – assembler prepares those symbols – loader & linker resolves the value of external symbols 21
  22. 22. Control Sections (cont’d)• a table prepared by assembler – define record • name of symbol defined in this control section • relative address of the symbol – refer record • name of external symbols – modification record • starting address of field to be modified • length of this field • name of external symbol• loader – for every external symbol • find the relative address from the define record • add the starting address of the control section where the symbol is defined • modify the field 22
  23. 23. One-Pass Assembler• problem – forward reference: reference to symbols that are not defined yet• why do we need one-pass assembler? – fast • useful for program development and testing • university computing environment• load-and-go assembler – writes the object code on memory not on disk file – since it is on memory it is easy to modify a part of object code 23
  24. 24. One-Pass Assembler (cont’d)• one-pass assembler for load-and-go – stores undefined symbols in the SYMTAB with the address of the field that references this symbol – when the symbol is defined later, look up the SYMTAB and modify the field with correct address • there may be many places to be modified• what if object code is written on disk? – bring back the text to memory • efficiency of one-pass assembler cannot be justified – make loader to modify the address at loading time • modification record again• optimization – require all the data declaration be placed at the beginning of the program • reduces reference resolution 24
  25. 25. Multi-Pass Assembler • support forwarding reference even though it is bad for program readability at 1, store in a table two tuples (A, 1, B/2, 0) 1: one symbol is missing1.(A = B/2) 0: no other symbol depends on A2.(B = C-D) (B, *, , &LB) .... *: don’t know how many symbols missing yet8. C .....9. D ..… LB: list of symbols that depend on B (now, there is only A in this list) at 2, insert (C,*, ,&LC), (D,*, ,&LD) LC and LD contains only B modify (B,*, ,&LB) as (B,2,C-D,&LB) after 8 from LC, B is found change 2 to 1 in the B tuple meaning one symbol remains to be defined after 9 from LD, B is found now evaluate B with defined C, D values since B is done from LB, A is found now A can be evaluated 25

×