Sunteți pe pagina 1din 12

Unit 2 Assemblers Structure: 2.1 Introduction Objectives 2.2 Assembly Language 2.3 Basic Assembler Functions 2.

4 Design Specification of Assembler 2.4.1 Data Structures 2.4.2 Pass1 & pass2 Assembler flow charts 2.5 Tasks of Assemblers 2.6 Translation of Assemblers 2.7 Examples: MASM Assembler and SPARC Assembler 2.8 Summary 2.9 Terminal Questions 2.1 Introduction An assembler is a program (system software) which accepts assembly language program as input and produces its equivalent machine language program as output along with information for the loader. The input to the assembler program is called the source program and the output is called the object program.

Fig. 2.0: Function of an Assembler An Assembly Language is a machine dependent, low level programming language which is specific to a certain computer system (or a family of computer systems). Compared to the machine language of a computer system, it provides three basic features which simplify programming: 1. Mnemonic operation codes: Use of mnemonic operation codes (also called mnemonic opcodes) for machine instructions eliminates the need to memorize numeric operation codes. It also enables the assembler to provide helpful diagnostics, for example indication of misspell operation codes. 2. Symbolic Operands: Symbolic names can be associated with data or instructions. These symbolic names can be used as operands in assembly statement. The assembler performs memory bindings to these names; the programmer need not know any details of the memory bindings performed by the assembler.

3. Data Declarations: Data can be declared in a variety of notations, including the decimal notation. This avoids manual conversion of constants into their internal machine representation, for example, conversion of 5 into (11111010)2 or 10.5 into (41A80000)16. Statement format An assembly language statement has the following format: [Label] <Opcode> <operand spec>[,<operand spec> ..} where the notation [..] indicates that the enclosed specification is optional. If a label is specified in a statement, it is associated as a symbolic name with the memory word(s) generated for the statement. <operand spec> has the following syntax: <symbolic name> [+<displacement>][(<index register>)] Thus, some possible operand forms are: AREA, AREA+5, AREA (4), and AREA+5 (4). The first specification refers to the memory word with which the name AREA is associated. The second specification refers to the memory word 5 words away from the word with the name AREA. Here 5 is the displacement or offset from AREA. The third specification implies indexing with index register 4that is, the operand address is obtained by adding the contents of index register 4 to the address of AREA. The last specification is a combination of the previous two specifications. Objectives: At the end of this unit the students would be able to: Write Assembly Language programs Use basic assembler functions Understand the passes and tasks of an assembler 2.2 Assembly Language In this language, each statement has two operands, the first operand is always a register which can be any one of AREG, BREG, CREG and DREG. The second operand refers to a memory word using a symbolic name and an optional displacement. (Note that indexing is not permitted.)

Table 2.0: Mnemonic operation codes Table 2.0 shows the lists of the mnemonic opcodes for machine instructions. The MOVE instructions move a value between a memory word and a register. In the MOVER instruction the second operand is the source operand and the first operand is the target operand. Converse is true for the MOVEM instruction. All arithmetic is performed in a register (i.e. the result, replaces the contents of a register) and sets a condition code. A comparison instruction sets a condition code analogous to a subtract instruction without affecting the values of its operands. The condition code can be tested by a Branch on Condition (BC) instruction. The assembly statement corresponding to it has the format

BC <condition code spec>, <memory address> It transfers control to the memory word with the address <memory address> if the current value of condition code matches <condition code spec>. For simplicity, we assume <condition code spec> to be a character string with obvious meaning, e.g. GT, EG, etc. A BC statement with the condition code spec ANY implies unconditional transfer of control. In a machine language program, we show all addresses and constants in decimal rather than in octal or hexadecimal. Assembly Language Statements An assembly program contains three kinds of statements: 1. Imperative statements 2. Declaration statements 3. Assembler directives. Imperative statements An imperative statement indicates an action to be performed during the execution of the assembled program. Each imperative statement typically translates into one machine instruction.
Declaration statements

The syntax of declaration statements is as follows: [Label] DS <constant> [Label] DC <Value> <constant> <value> The DS (short for declare storage) statement reserves areas of memory and associates names with them. Consider the following DS statements: A DS 1 G DS 200 The first statement reserves a memory area of 1 word and associates the name A with it. The second statement reserves a block of 200 memory words. The name G is associated with the first word of the block. Other words in the block can be accessed through offsets from G, e.g. G+5 is the sixth word of the memory block, etc. The DC (short for declare constant) statement constructs memory words containing constants. The statement ONE DC 1 associates the name ONE with a memory word containing the value 1. The programmer can declare constants in different formsdecimal, binary, hexadecimal, etc. The assembler converts them to the appropriate internal form. Assembler Directives Assembler directives instruct the assembler to perform certain actions during the assembly of a program.>3 of 12

Some assembler directives are described in the following. START <constant> This directive indicates that the first word of the target program generated by the assembler should be placed in the memory word with address <constant>. END [<operand spec>] This directive indicates .the end of the source program. The optional <operand spec> indicates the address of the instruction where the execution of the program should begin. (By default, execution begins with the first instruction of the assembled program.). Self Assessment Questions 1) What is assembler? Explain its basic functionality 2) What are different assembler directives? 3) Is assembler is required for developing software applications ? Give your comments. 2.3 Basic Assembler Functions An assembler must does the following tasks. 1. Generate instructions a. Evaluate the mnemonic in the operation field to produce its machine code. b. Evaluate the sub field to find the value of each symbol, process literals and assign address. 2. Process pseudo-operations We can group these tasks into two pass or sequential scans over the input, associated with task are one or more assembler modules. Necessity of Two passes for Assembler Because symbols can appear before they are defined, it is convenient to make two passes over the input. The first pass is only to define the symbols; the second pass can then generate the instruction and addresses. Purposes of the pass 1 and pass 2 Pass 1: Purpose Define Symbols and Literals 1. Determine length of machine instructions 2. Keep track of Location Counter (LC) 3. Remember values of symbols until pass 2 4. Process some pseudo-operation Pass 2: Purpose Generate object program 1. Look value of symbols

2. Generate instruction 3. Generate Date 4. Process pseudo-ops. Self Assessment Questions 1) What are basic functions of assembler? 2.4 Design Specification of an Assembler There are six steps to be followed in the design of assembler. They are: 1. Specify the problem. 2. Specify data structures 3. Define format of data structures. 4. Specify algorithm 5. Look for modularity. (capability of one program to be subdivided into independent programming units). 6. Repeat 1 through 5 on each module. In the first step we have to specify the function the assembler has to perform. The second step specifies the data the assembler needs to perform in further operations. This will be stored in the form of tables, which is called as database (data structure). Thus the assembler makes use of the information, which is present in the database for further processing. In the third step we specify the structure or the way data has to be stored in the database. It specifies the format of storing of data, and the contents of the database. The fourth step gives the algorithm, which has to be converted to program to get the result from the assembler. The fifth step is the step for dividing the program into sub problems, which enables the designer to write the assembler efficiently. Finally the same steps have to be repeated for the sub problems, which has been divided from the given program. Specify the problem or Statement of problem The fundamental information requirements arise in the synthesis phase of an assembler. Hence it is best to begin by considering the information requirements of the synthesis tasks. We then consider how to make this information available, i.e. whether it should be collected during analysis or derived during synthesis. Consider the assembly statement MOVER BREG, ONE We must have the following information to synthesize the machine instruction corresponding to this statement: 1. Address of the memory word with which name ONE is associated, 2. Machine operation code corresponding to the mnemonic MOVER. The first item of information depends on the source program. Hence it must be made available by the analysis phase. The second item of information does not depend on the source program, it merely depends on the assembly language. Hence the synthesis phase can determine this information for itself. 2.4.1 Data Structures

The second step in our design procedure is to establish the databases that we have to work with. Pass 1 Data Structures 1. Input source program 2. A Location Counter (LC), used to keep track of each instructions location. 3. A table, the Machine-operation Table (MOT), that indicates the symbolic mnemonic, for each instruction and its length (two, four, or six bytes) 4. A table, the Pseudo-Operation Table (POT), that indicates the symbolic mnemonic and action to be taken for each pseudo-op in pass 1. 5. A table, the Symbol Table (ST), that is used to store each label and its corresponding value. 6. A table, the literal table (LT), that is used to store each literal encountered and its corresponding assignment location. 7. A copy of the input to be used by pass 2. Pass 2 Data Structures 1. Copy of source program input to pass1. 2. Location Counter (LC) 3. A table, the Machine-operation Table (MOT), that indicates for each instruction, symbolic mnemonic, length (two, four, or six bytes), binary machine opcode and format of instruction. 4. A table, the Pseudo-Operation Table (POT), that indicates the symbolic mnemonic and action to be taken for each pseudo-op in pass 2. 5. A table, the Symbol Table (ST), prepared by pass1, containing each label and corresponding value. 6. A Table, the base table (BT), that indicates which registers are currently specified as base registers by USING pseudo-ops and what are the specified contents of these registers. 7. A work space INST, that is used to hold each instruction as its various parts are being assembled together. 8. A work space, PRINT LINE, used to produce a printed listing. 9. A work space, PUNCH CARD, used prior to actual outputting for converting the assembled instructions into the format needed by the loader. 10. An output deck of assembled instructions in the format needed by the loader.

Fig. 2.1: Data structures of the Assembler Format of Data Structures The third step in our design procedure is to specify the format and content of each of the data structures. Pass 2 requires a machine operation table (MOT) containing the name, length, binary code and format; pass 1 requires only name and length. Instead of using two different table, we construct single (MOT). The Machine operation table (MOT) and pseudo-operation table are example of fixed tables. The contents of these tables are not filled in or altered during the assembly process. The following Table 2.1 shows the format of the machine-op table (MOT) 6 bytes per entry Mnemonic Opcode (4 bytes) characters Abbb Ahbb ALbb ALRB . Binary opcode (1 Instruction length Instruction Not used byte) format here (2 bits) (binary) (hexadecimal) (3 bits) (binary) (3 bits) 5A 4A 5E 1E . 10 10 10 01 . 001 001 001 000 .

b represents blank 2.4.2 The Flow Chart for Pass-1

The primary function performed by the analysis phase is the building of the symbol table. For this purpose it must determine the addresses with which the symbol names used in a program are associated. It is possible to determine some address directly, e.g. the address of the first instruction in the program,

however others must be inferred. To implement memory allocation a data structure called location counter (LC) is introduced. The location counter is always made to contain the address of the next memory word in the target program. It is initialized to the constant. Whenever the analysis phase sees a label in an assembly statement, it enters the label and the contents of LC in a new entry of the symbol table. It then finds the number of memory words required by the assembly statement and updates; the LC contents. This ensure: that LC points to the next memory word in the target program even when machine instructions have different lengths and DS/DC statements reserve different amounts of memory. To update the contents of LC, analysis phase needs to know lengths of different instructions. This information simply depends on the assembly language hence the mnemonics table can be extended to include this information in a new field called length. We refer to the processing involved in maintaining the location counter as LC processing.

Self Assessment Questions 1. What do you mean by pass1 and pass2 Assemblers 2. What are design specification of an Assembler 3. Write any Assembly Language Program of your choice. 2.5 Tasks of Assemblers Tasks performed by the passes of a Two Pass assembler are as follows: Pass I 1. Separate the symbol, mnemonic opcode and operand fields. 2. Build the symbol table. 3. Perform LC processing. 4. Construct intermediate representation. Pass II: Synthesize the target program. Pass I performs analysis of the source program and synthesis of the intermediate representation while Pass II processes the intermediate representation to synthesize the target program. The design details of assembler passes are discussed after introducing advanced assembler directives and their influence on LC processing.

2.6 Translation of Assemblers Here we discuss two pass and single pass assembly schemes in this section: Two Pass Translation Two pass translation of an assembly language program can handle forward references easily. LC processing is performed in the first pass and symbols defined in the program are entered into the symbol table. The second pass synthesizes the target form using the address information found in the symbol table. In effect, the first pass performs analysis of the source program while the second pass performs synthesis of the target program. The first pass constructs an intermediate representation (IR) of the source program for use by the second pass. This representation consists of two main components data structures, e.g. the symbol table, and a processed form of the source program. The latter component is called intermediate code (IC). Single Pass Translation LC processing and construction of the symbol table proceed as in two pass translation. The problem of forward references is tackled using a process called backpatch-ing. The operand field of an instruction containing a forward reference is left blank initially. The address of the forward referenced symbol is put into this field when its definition is encountered. Table 2.2 instructions corresponding to the statement MOVER BREG, ONE START READ MOVER MOVEM AGAIN MULT MOVER ADD MOVEM COMP BC MOVEM 101 N BREG, ONE BREG, TERM BREG, TERM CREG, TERM CREG, ONE CREG, TERM CREG, N LE, AGAIN BREG, RESULT 101) 102) 103) 104) 105) 106) 107) 108) 109) 110) + 09 0 113 + 04 2 115 + 05 2 116 + 03 2 116 + 04 3 116 + 01 3 115 + 05 3 116 + 06 3 113 + 07 2 104 + 05 2 114

PRINT STOP N DS

RESULT

111) 112)

+ 10 0 114 + 00 0 000

1 1 1 1

113) 114) 115) 116) + 00 0 001

RESULT DS ONE TERM DC PS END

can be only partially synthesized since ONE is a forward reference. Hence the instruction opcode and address of BREG will be assembled to reside in location 101. The need for inserting the second operands address at a later stage can be indicated by adding an entry to the Table of Incomplete Instructions (TII). This entry is a pair (instruction address>, <symbol>), e.g. (101, ONE) in this case. By the time the END statement is processed, the symbol table would contain the addresses of all symbols defined in the source program and TII would contain information describing all forward references. The assembler can now process each entry in TII to complete the concerned instruction. For example, the entry (101, ONE) would be processed by obtaining the address of ONE from symbol table and inserting it in the operand address field of the instruction with assembled address 101. Alternatively, entries in TII can be processed in an incremental manner. Thus, when definition of some symbol symb is encountered, all forward references to symb can be processed. 2.7 MASM Assembler and SPARC Assemblers MASM: Microsoft Macro Assembler The Microsoft Macro Assembler (MASM) is an assembler for the x86 family of microprocessors, originally produced Microsoft MS-DOS operating system. It supported a wide variety of macro facilities and structured programming idioms, including high-level constructions for looping, procedure calls and alternation (therefore, MASM is an example of a high-level assembler). Later versions added the capability of producing programs for the Windows operating systems that were released to follow on from MS-DOS. MASM is one of the few Microsoft development tools for which there was no separate 16-bit and 32-bit version. Assembler affords the programmer looking for additional performance a three-pronged approach to performance based solutions. MASM can build very small high performance executable files that are well suited where size and speed matter. When additional performance is required for other languages, MASM can enhance the performance of these languages with small fast and powerful dynamic link libraries. For programmers who work in Microsoft Visual C/C++, MASM builds modules and libraries that are in the same format so the C/C++ programmer can build modules or libraries in MASM and directly link them into their own C/C++ programs. This allows the C/C++ programmer to target critical areas of their code in a very efficient and convenient manner, graphics manipulation, games, very high speed data manipulation and processing, parsing at

speeds that most programmers have never seen, encryption, compression and any other form of information processing that is processor intensive. For programmers who are not familiar with 32 bit Windows assembler, there is speed and performance available that you may never have seen before and contrary to popular legend, if you can write a Windows application in C/C++, Basic, Pascal or other similar compiler based languages, you can write it in MASM with very similar looking code if you bother to learn the MASM high level syntax. MASM32 has been designed to be familiar to programmers who have already written API based code in Windows. The invoke syntax of MASM allows functions to be called in much the same way as they are called in a high level compiler. The traditional Notation for calling a function is as follows,
Push par4 push par3 push par2 push par1 call FunctionName mov retval, eax.

SPARC Assembler SPARC (which stands for Scalable Processor ARChitecture) is an open set of technical specifications that any person or company can license and use to develop microprocessors and other semiconductor devices based on published industry standards. SPARC was invented in the labs of Sun Microsystems Inc. based upon pioneering research into Reduced Instruction Set Computing (RISC) at the University of California at Berkeley. The first standard product based on the SPARC architecture was produced by Sun and Fujitsu in 1986; Sun followed in 1987 with its first workstation based on a SPARC processor. In 1989, Sun Microsystems transferred ownership of the SPARC specifications to an independent, non-profit organization, SPARC International, which administers and licenses the technology and provides compliance testing and other services for its members. SPARC is a modern, fast, pipelined architecture. Its assembly language illustrates most of the features found in assembly languages for the variety of computer architectures which have been developed.. 2.8 Summary The current chapter highlighted the assemblers and their potentials. An assembler is a program (system software) which accepts assembly language program as input and produces its equivalent machine language program as output along with information for the loader. The input to the assembler program is called the source program and the output is called the object program. Assembler can be implemented in two passes one pass1 and other one is pass2 assembler. Corresponding flow charts are given in the section 3.5.students can through the flow charts. We have also discussed pass structures and two pass assemblers and their details. Finally, we took examples from SPARC and MASM. These are two assemblers are popular ones in the market. SPARC is a SUN product and MASM is Microsoft product. 2.9 Terminal Questions 1) What are pass1 and pass2 Assembler? Write their data structures. 2) Draw the flow chart of pass1 assemblers. 3) What is MSAM? Explain its features. 4) Write the Pass1 and Pass2 data structures in detail.

S-ar putea să vă placă și