Sunteți pe pagina 1din 21

UNIT 3 Assembler

Elements of Assembly Language programming An assembly language is a machine dependent, low level programming language which is specific to a certain computer system Compared to machine language of a computer system, it provides 3 basic features which simplify programming: 1) Mnemonic operation codes: Eliminates the need to memorize numeric operation codes. enables the assembler to provide helpful diagnostics (misspelling of any opcode) 2) Symbolic operands Symbolic names can be associated with data or instructions, Symbolic names are used as operands in assembly statements Assembler performs memory binding , so programmer need not to know any details of the memory bindings 3) Data Declaration: Data can be declared in a variety of notations, including the decimal notation. Avoids manual conversion of constants into their internal machine representation Statement format [Label] <opcode> <operand spec> [,<operand spec>]

Label: associated as a symbolic name with the memory word generated for the statement. A simple assembly language First operand is a register Second operand refers to a memory word Instruction Opcode Assembly Mnemonic

00 01 02 03 04 05 06 07 08 09 10

STOP ADD SUB MULT MOVER MOVEM COMP BC DIV READ PRINT

An assembly and equivalent machine language program The following program computes N! START READ MOVER MOVEM AGAIN MULT MOVER ADD MOVEM COMP BC MOVEM PRINT STOP N DS RESULT DS ONE DC TERM DS END 101 N BREG, ONE BREG, TERM BREG, TERM CREG, TERM CREG, ONE CREG, TERM CREG, N LE, AGAIN BREG, RESULT RESULT 1 1 1 1 101) 102) 103) 104) 105) 106) 107) 108) 109) 110) 111) 112) 113) 114) 115) 116) 09 04 05 03 04 01 05 06 07 05 10 00 00 0 2 2 2 3 3 3 3 2 2 0 0 0 113 115 116 116 116 115 116 113 104 114 114 000 001

COMP: To set condition code without affecting the values of its operands. BC : to test condition BC <Condition code spec>, <memory address>

LT, LE, EQ, GT, GE, ANY == 1, 2, 3, 4, 5, 6

Assembly Language Statements:


1 2 3 1.Imperative Statements(IS) 2.Declaration Statements 3.Assembler Directives(AD)

Imperative Statements Indicates an action to be performed during the execution of the assembled program. Each imperative statement translate into one machine instruction Declaration Statements: Syntax [label] DS <CONSTANT> [label] DC <Value> DS = Declare Storage DC = Declare Constant Example of Declarative statement A G DS DS 1 200

The first statement reserves a memory area of 1 word and associates the name A with it. The second statements reserves a block of 200 memory words. The name G is associated with the first word of the block. Other words in the block can be accessed through offset from G. e.g. G+5 is the sixth word of the memory block. Example of Declarative statement The statement ONE DC 1

Associates the name one with a memory word containing the value 1. DC statement does not really implement constants; it merely initialized memory words to given values. These values are not protected by the assembler. These values may be changed by moving a new value into the memory word.

Immediate Operand -supported only if architecture of target includes necessary features e.g. Literals Literal is an operand with the syntax =<value>. It differs from a constant because its location cannot be specified in the assembly program. This helps to ensure that its value is not change during the execution of the program. ADD AREG, =5 ADD AREG, FIVE FIVE DC 5 ADD AREG, 5

Assembler Directives START <constant>

o This directive indicates that the first word of the target program generated by the assembler should be placed in the memory word with address <constant> o Location Counter (LC) is initialized to the constant specified in the START statement. END [<operand spec>]

This directives indicates the end of the source program. The optional <operand spec> indicates the address of the instruction where the execution of the program should begin. Advantages of Assembly Language over machine language 1) use of symbolic operand specification START READ MOVER MOVEM MULT MOVER ADD 101 N BREG, ONE BREG, TERM BREG, TERM CREG, TERM CREG, ONE 101) 102) 103) 104) 105) 106) 09 04 05 03 04 01 0 2 2 2 3 3 114 116 117 117 117 116

AGAIN

MOVEM COMP BC DIV MOVEM PRINT STOP N DS RESULT DS ONE DC TERM DS TWO DC END

CREG, TERM CREG, N LE, AGAIN BREG, TWO BREG, RESULT RESULT 1 1 1 1 2

107) 108) 109) 110) 111) 112) 113) 114) 115) 116) 117) 118)

05 06 07 08 05 10 00

3 3 2 2 2 0 0

117 114 104 118 115 115 000

A SIMPLE ASSEMBLY SCHEME Design specification of an assembler 1) Identify the information necessary to perform a task. 2) Design a suitable data structure to record the information. 3) Determine the processing necessary to obtain and maintain the information. 4) Determine the processing necessary to perform the task. 0 2 3 Lang. Proc = Analysis of SP + Syntheses of SP Information are collected either during analysis phase or during synthesis phase.

Assembler contains 1 1.Analysis phase 2 2.Synthesis phase Synthesis Phase 1 Consider the assembly statement MOVER BREG, ONE 1 2 To synthesis a machine instruction we must have the following information. 0 1. Address of the memory word with which name ONE is associated. 1 2. Machine operation code corresponding to the mnemonic MOVER. 2

The address of ONE depends on the source program hence it is available by the analysis phase. Second information does not depend on source program so synthesis phase can determine this information for itself. 1 During synthesis phase two data structures are used. 1 Symbol Table Mnemonics table 2 Symbol Table: 1 Each entry of the symbol table has two primary fields 1. name and 2. address. 1 The table is build by the analysis phase. 2 Mnemonics Table : 1 An entry in the mnemonics table has two primary fields1. mnemonic and 2. opcode. 1 Hence the tables have to be searched with the symbol name and the mnemonic as keys during synthesis. Analysis Phase 1 Primary Function: building of the symbol table. Hence it must determine the addresses of symbolic name used in program. To determine the address of N we must fix the address of all program elements preceding it. This is called memory allocation. How to implement memory allocation? For this a data structure called location counter (LC) is introduced The LC is always contain the address of the next memory word in the target program. It is initialized to the constant specified in the START statement. Whenever the analysis phase sees a label in a n assembly statement it enters the label and the contents of LC in a new entry of the symbol table. It then finds the number of memory words required by the assembly statement and updates the LC contents. To update the content of LC or to make it point to the next memory word in the TP(even when DS/DC instructions), analysis phase needs to know length of different instruction. Hence the mnemonics table can be extended to include this information in a new field called length. Process of maintaining the location counter is called as LC processing.

Data Structures of the Assembler

Symbol Table Mnemonics Table: Fixed table Accessed by the analysis and synthesis phase

Symbol Table :: constructed during analysis and used during synthesis. Tasks performed by the analysis and synthesis phases are as follows: Analysis Phase Isolate the label, mnemonic opcode and operand fields of a statement. If a label is present, enter the pair (symbol, <LC contents>) in a new entry of symbol table. Check validity of the mnemonic opcode through a look-up in mnemonic table. Perform LC processing, i.e update value in LC by considering the opcode and operands of statements. Synthesis Phase Obtain the machine opcode corresponding to the mnemonic from the mnemonic table. Obtain the address of a memory operand from the symbol table. Synthesize a machine instruction or the machine form of a constant, as the case may be.

PASS STRUCTURES OF ASSEMBLERS SINGLE PASS TRANSLATION 1 2 Problem: forward reference 3 Solution: Backpatching LC processing and construction of the symbol table are performed in this pass. What is Backpatching? The operand field of an instruction containing a forward reference is left blank initially. 1 The address of the forward referenced symbol is put into this field when its definition is encountered in following instruction: MOVER BREG, ONE Above given instruction can be partially synthesized since ONE is a forward reference 1 We require to create Table of Incomplete Information (TII) to contain the symbols which are forward referenced. 2 For example, (<instruction address>, <symbol>) (101, ONE) where 101 is the instruction address. 3 By the time of END statement is processed, 0 symbol table would contain the addresses of all symbols defined in the SP 1 TII would contain information describing all forward references 0 1 Assembler can now process each entry in TII to complete the concerned instruction. 4 TWO PASS TRANSLATION Can handle forward reference easily. First pass performs analysis of the SP while the Second pass performs synthesis of the TP

In First Pass: 1 LC processing is performed, and 2 symbols defined in the program are entered into the symbol table. Pass I ==> IR of SP ==> used by PASS II Intermediate Representation consists of two main component data structures (Symbol Table) Intermediate Code (IC) In Second Pass:

3 synthesizes the target form using the address information found in symbol table. Design of a Two PASS ASSEMBLER Task performed by the passes of a two pass assembler are: 1 2 Pass I: 3 1. Separate the symbol, mnemonic op code and operand fields. 4 2. Build the symbol table. 5 3. Perform LC processing. 6 4. Construct intermediate representation. 1 2 2 3 4 5 Pass II: 1. Synthesize the target program. Thus, Pass I performs analysis of SP and synthesize IR while, Pass II processes IR to synthesize TP.

Advanced Assembler Directives: 1. ORIGIN 2. EQU 3. LTORG ORIGIN: 0 0 1 1 2 3 4 5 6 7 12 13 The syntax of this directive is ORIGIN <address spec> Where <address spec> is an <operand spec> or <constant>. LC should be set to the address given by <address spec>. useful when the target program does not consists of consecutive memory words. It provides the ability to perform LC processing in a relative rather than absolute manner. START MOVER MOVEM MOVER MOVER ADD . BC LTORG 200 AREG, AREG, AREG, CREG, CREG,

LOOP

=5 A A B =1

200) 201) 202) 203) 204) 210)

04 05 04 05 01 07

1 1 1 3 3 6

211 217 217 218 212 214

ANY, NEXT

=5 =1 14 15 16 17 18 19 20 21 22 23 24 25 EQU Syntax <symbol> EQU <address spec> NEXT LAST SUB BC STOP ORIGIN MULT ORIGIN DS EQU DS END AREG,=1 LT,BACK LOOP+2 CREG,B LAST+1 1 LOOP 1 =1

211) 212) 214) 215) 216) 204) 217) 218) 219)

00 00 02 07 00 03

0 0 1 1 0 3

005 001 219 202 000 218

A BACK B

00

001

Where <address spec> is an <operand spec> or <constant>. The EQU simply associates the name <symbol> with <address spec> Example START 200 MOVER MOVEM MOVER ADD BC EQU AREG, =5 AREG, A AREG, A CREG, B LT, BACK LOOP

LOOP BACK

202) 203) 202)

Last statement is assembled as + 07 1 202 as loop address is 202. No LC processing is implied so it is different from DS/DC. LTORG Literal can be handled in two steps. First, The literal is treated as if it is a <value> in DC statement, i.e. a memory word containing the value of the literal is formed. Second, this memory word is used as the operand in place of the literal. Where to put the literal? Literal should be placed such that control never reaches it during the execution of program.

1 2

By default, literal is placed after the END statement. The LTORG statement permits the specification of literals. The assembler allocates memory to the literals of a literal pool. The pool contains all literals used in the program. Example START MOVER ADD LTORG =5 =1 200 AREG, =5 CREG,=1 200)+04 1 211 204)+01 3 212 211)+00 0 005 212)+00 0 001

All reference to literals are forward reference by definition. PASS I OF THE ASSEMBLER Pass I uses data structures: 0 1 OPTAB: A table of mnemonic opcode and related information. 2 SYMTAB : Symbol Table. 3 LITTAB: A table of Literals. OPTAB Mnemonic opcode MOVER DS START class IS DL AD . Mnemonic info (04,1) R#7 R#11

SYMTAB Symbol LOOP NEXT LAST A BACK B address 202 214 216 217 202 218 Length 1 1 1 1 1 1

LITTAB Literal 1 2 3 POOLTAB Litera l No #1 #3 -- OPTAB contains opcode, class & mnemonic info. The class field indicates whether the opcode corresponds to IS Imperative Statement DS Declarative Statement AD Assembler Directive If an imperative statement, the mnemonic info field contains the pair of (machine opcode, instruction length) If an DS and AD statement, contains the id of routine to handle the declaration of directive statement. SYMTAB contains field of address and length. LITTAB contains the fields of literal and address. Pass I centers around interpretation of OPTAB entry for mnemonic. Length of machine instruction is added to LC. For, AD and DS, routine mentioned is called from mnemonic information field to perform appropriate processing to determine memory requirement Appropriately update the LC and SYMTAB When LTORG statement (i.e. END) is encountered, literals in current pool are allocated the addresses starting with current value in LC and then LC is incremented. STEPS =5 =1 =1 address

1. Processing of an assembly statement begins with the processing of its label field. If it contains a symbol, the symbol and the value in LC is copied into a new entry of SYMTAB. 2. Thereafter it interprets OPTAB entry for the mnemonic. The class field of the entry is examined to determine whether the mnemonic belongs to the class of imperative, declaration or assembler directive statements. 3. If it is an imperative statement, the length of the machine instruction is added to the LC. Length is also entered into symbol table if it is defined. 4. For declaration or assembler directive statement, the routine mentioned in the mnemonic information field is called to perform appropriate processing of the statement. For example in case of DS statement, routine R#7 would be called. 5. The first pass uses LITTAB to collect all literals used in a program. It also uses POOLTAB which contains the literal number of the starting literal of each literal pool. LTORG statement or the END statement, literals in the current pool are allocated addresses starting with current value in LC and LC is incremented.

Algorithm (Assembler First Pass) 1. Loc_cntr:=0;(default value) Pooltab_ptr:=1; POOLTAB[1]:=1; Littab_ptr:=1;

2. While next statement is not an END statement a. If label is present then This_label:=symbol in label field. Enter (this_label, loc_cntr) in SYMTAB. b. If an LTORG statement then (i) Process literals LITTAB [POOLTAB [ pooltab_ptr ] ] LITTAB[littab_ptr1] to allocate memory and put the address in the address field. Update loc_cntr accordingly. (ii) Pooltab_ptr:=pooltab_ptr+1. (iii) POOLTAB[pooltab_ptr]:=littab_ptr. c. If a START or ORIGIN statement then Loc_cntr := value specified in operand field. d. If an EQU statement then i. This_addr := value of <address spec>; ii. Correct the symbol entry for this_label to (this_label, this_addr)

e. If a declaration statement then i. Code:=code of the declaration statement; ii. Size:=size of memory area required by DC/DS; iii. Loc_cntr:=loc_cntr + size; iv. Generate IC (DL,code) f. If an imperative statement then i. Code:=machine opcode from OPTAB; ii. Loc_cntr := loc_cntr+instruction length from OPTAB; iii. If operand is a literal then This_literal:=literal in operand field; LITTAB[littab_ptr]:=this_literal; Littab_ptr:=littab_ptr+1; Else (i.e. operand is symbol) This_entry:=SYMTAB entry number of operand. Generate IC (IS code) (S,this_entry); 3. (Processing of END statement) a. Perform step 2(b). b. Generate IC (AD, 02) c. Go to Pass-II INTERMEDIATE CODE FORMS: We consider some variants of intermediate codes and compare them on the basis of processing efficiency and memory economy. 1 The Intermediate code consists of a set of IC units. Each IC unit consisting of the following three fields: 1 Address 2 Representation of the mnemonic opcode 3 Representation of operands. 1 The mnemonic field contains a pair of the form (statement class, code) Statement class can be IS, DL and AD. For an imperative statement code is the instruction opcode in the machine language. For declaration and assembler directives, code is an ordinal number within the class. Thus, (AD, 01) stands for assembler directive number 1 which is the directive START. Codes for declaration statements and directives

Declaration Statements DC 01 DS 02

Assembler START END ORIGIN EQU LTORG VARIENT 1

Directive 01 02 03 04 05

First operand is represented by single digit number is either register or condition code itself. 1 2 (1-4 of AREG-DREG) 3 (1-6 of LT-ANY) Second operand which is a memory operand is represented in form: (operand class, code) 2 Operand Class: 1 C - Constants 2 S - Symbols 3 L Literals 3 4 eg: START 200 is (C, 200). 5 For symbol or literal, code field contains ordinary no. of operands entry in SYMTAB or LITTAB. 6 Eg: xyz = 25 7 sym lit Form:(s,17) (l,25) In forward reference, MOVER AREG, A 0 1 Its necessary to enter A in SYMTAB 2 Only then its represented by (S, n) in IC. START READ LOOP MOVER ----SUB BC STOP A DS 200 A AREG, A AREG, =1 GT, LOOP 1 (AD, 01) (C,200) (IS,09) (S,01) (IS,04) (1)(S,01) (IS,02) (1)(L,01) (IS,07) (4)(S,02) (IS,00) (DL,02) (C,1)

LTORG 1 2

(DL,05)

At this moment, address and length field cannot be filled in SYMTAB. Means, in SYMTAB, at any time, two kinds of entry exist: Defined Symbol Forward Reference.

This fact helps error detection VARIENT II forms. How does it differ from variant I? Here, the operand fields of source statement are selectively replaced by their processed For DL and AD, processing operand field is necessary to support LC processing. For IS, operand field is processed to identify literal reference. Literals are entered in LITTAB and represented as (L,M) in IC. Symbolic references in source statement are not processed at all during pass 1. (AD, 01) (C, 200) (IS, 09) A (IS, 04) AREG,A (IS, 02) AREG, (L, 01) (IS, 07) GT, LOOP (IS, 00) (DL, 02) (C, 1)

START 200 READ A LOOP MOVER AREG, A ----SUB AREG, =1 BC GT, LOOP STOP A DS 1 LTORG (DL, 05) 1

IC require extra work in Pass I since, operand fields are completely processed. COMPARISON BETWEEN VARIENT I AND VARIENT II VARIENT I Operand fields are completely processed Simplified task of Pass II The IC is quite compact as operand reference like (s,n) can be represented in same no. of bits as operand address in machine instruction. Pass I occupies more memory than Pass II because it performs more work. VARIENT II

Reduces work of Pass I by transferring burden of operand processing from Pass I to Pass II. Equally weightage for pass II. IC is less compact because, memory operand of IS is in source form. Pass I and Pass II occupies almost equal memory. Preferably suited for: o Where expressions are permitted in operand fields. Eg: MOVER AREG,A+5 Preferably suited for: o Not at all processed operand fields. Eg: (IS,05)(1)(5,01)+5 MEMORY REQUIREMENT:

PROCESSING OF DECLARATIONS AND ASSEMBLER DIRECTIVES: Our focus is: identify alternative ways of processing declaration statements and assembler directives. 1 This depends on answers of two related questions. 2 1. Is it necessary to represent the address of each source statement in IC ? 3 2. Is it necessary to have an explicit representation of DS statements and assembler directives in IC? Consider following code and its IC. AREA SIZE START DS DC 200 20 5 -----) 200) 220) (AD,01) (C,200) (DL,02) (C,20) (DL,01) (C,5)

It is redundant to have the representation of START and DS statements in IC.

Thus, its not necessary to have representation of DS and START in IC if IC contains address field. If the address field of the IC is omitted, a representation for DS statements and assembler directives becomes essential. Now pass-II can determine the address of SIZE only after analyzing the intermediate code units for the START and DS statements. If the representation of address of each source statement existence in IC, it avoids the processing of START and DS statement. So, space time tradeoff. DC statement 1 A DC statement must be represented in IC. 2 If a DC statement defines many constants, 3 e.g. DC 5, 3, -7 4 A series of (DL,01) units can be put in the IC. START and ORIGIN 1 These directives set new values into the LC. 2 It is not necessary to retain START and ORIGIN statements in the IC if the IC contains an address field. LTORG 1 Pass-I checks for the presence of literal reference in the operand field of every statement. If exists, it enters the literal in the current literal pool in LITTAB. When an LTORG statement appears in the source program, it assigns memory addresses to the literals in current pool. Pass-I construct an IC unit for the LTORG statement and values of literals can be inserted in the target program when this IC unit is processed in pass-II. Literals of the first pool are copied into the target program when the IC unit for LTORG is encountered in pass-ii and second pool once END statement is encountered. This alternative increases task performed by Pass I and its size. This leads to unbalanced pass structure with consequences. LTORG (DL,01) (C,5) 1 (DL,01) (C,1) (DL,01) is same for all the DC statements.

Algorithm for PASS I

Target code is to be assembled in the area named code_area.

Algorithm (Assembler Second Pass) 1. code_area_address := address of code_area; pooltab_ptr:= 1; loc_cntr:=0; 2. While next statement is not an END statement (a) Clear machine_code_buffer; (b) if an LTORG statement (i) Process literals in LITTAB [POOLTAB[pooltab_ptr]]. LITTAB[POOLTAB[pooltab_ptr+1] similar to processing of constants in a DC statement, i.eassemble the literals in machine_code_buffer; (ii) Size := size of memory area required for literals; (iii) pooltab_ptr:= pooltab_ptr+ 1; (c) If a START or ORIGIN statement then (i) loc_cntr:= value specified in operand field; (ii) size := 0; (d) if a Declaration Statement (i) if a DC statement then Assemble constant in machine_code_buffer; (ii) size := size of memory area required by DC/DS; (e) if an Imperative Statement (i) Get operand address from SYMTAB or LITTAB; (ii) Assemble instruction in machine_code_buffer; (iii) Size:= size of instruction; (f) if size != 0; (i) Move contents of machine_code_buffer to the address code_area_addres+ loc_cntr; (ii) loc_cntr:= loc_cntr+ size; 3. (Processing of END statement) (a) Perform steps 2(b) and 2(f). (b) Write code_area into the output file. ------------------------------------------------------------------Output Interface of the assembler: 1 Output: target program in machine language for that particular machine (CPU) 2 But is not always the case. 3 Generally it produces Object Module in the format required by linkage editor or loader. LISTING AND ERROR REPORTING:

Whether to produce program listing and error reporting in Pass-I or delay these actions until Pass-II? 1 Producing errors in the first pass has the advantage that the source program need not be preserved till pass-II. 2 Advantage: This avoids some amount of duplicate processing. 3 Disadvantage: A listing produced in Pass-I has disadvantage that it can report only certain errors like syntax errors like missing commas or parentheses and semantic errors 4 But can not report errors like duplicate definitions of symbols which can be reported only after Pass II. Recommended: Delay program listing and error reporting till Pass II. ERROR REPORTING PASS I

ERROR REPORTING PASS II

035

END

S-ar putea să vă placă și