Sunteți pe pagina 1din 24

MODULE I

MCA-303 SYSTEM SOFTWARE

ADMN 2011-12

1.1 Review of assembly and machine language programming


1.1.1 Machine Language This is a sequence of instructions written in the form of binary numbers consisting of 1's, O's to which the computer responds directly. Machine language was initially referred to as code, although now the term code is used more broadly to refer to any program text. An instruction prepared in any machine language will have at least two parts. The first part is the Command or Operation, which tells the computer what functions, is to be performed. All computers have an operation code for each of its functions. The second part of the instruction is the operand or it tells the computer where to find or store the data that has to be manipulated. Just as hardware is classified into generations based on technology, computer languages also have a generation classification based on the level of interaction with the machine. Machine language is considered to be the first generation language.

Advantage of Machine Language It is faster in execution since the computer directly starts executing it.

Disadvantage of Machine Language


It is difficult to understand and develop a program using machine language. Anybody going through this program for checking will have a difficult task understanding what will be achieved when this program is executed. Nevertheless, the computer hardware recognizes only this type of instruction code.

Dept. of Computer Science And Applications, SJCET, Palai

P a g e |1

MODULE I

MCA-303 SYSTEM SOFTWARE

ADMN 2011-12

1.1.2 Assembly Language


When we employ symbols (letter, digits or special characters) for the operation part, the address part and other parts of the instruction code, this representation is called an assembly language program. This is considered to be the second-generation language.

Machine and Assembly languages are referred to as low level languages since the coding for a problem is at the individual instruction level. Each machine has got its own assembly language, which is dependent upon the internal architecture of the processor.

An assembler is a translator which takes its input in the form of an assembly language program and produces machine language code as its output.

The following program is an example of an assembly language program for adding two numbers X and Y and storing the result in some memory location.

Dept. of Computer Science And Applications, SJCET, Palai

P a g e |2

MODULE I

MCA-303 SYSTEM SOFTWARE

ADMN 2011-12

From this program, it is clear that usage of mnemonics (in our example LD, ADD, HALT are the mnemonics) has improved the readability of our program significantly.An assembly language program cannot be executed by a machine directly as it is not in a binary form. An assembler is needed in order to translate an assembly language program into the object code executable by the machine. This is illustrated in the figure

Assembler

Dept. of Computer Science And Applications, SJCET, Palai

P a g e |3

MODULE I

MCA-303 SYSTEM SOFTWARE

ADMN 2011-12

Advantage of Assembly Language Assembly Language When we employ symbols (letter, digits or special characters) for the operation part, the address part and other parts of the instruction code, this representation is called an assembly language program. This is considered to be the second- generation language. Machine and Assembly languages are referred to as low level languages since the coding for a problem is at the individual instruction level. Each machine has got its own assembly language, which is dependent upon the internal architecture of the processor. An assembler is a translator which takes its input in the form of an assembly language program and produces machine language code as its output. The following program is an example of an assembly language program for adding two numbers X and Y and storing the result in some memory location.

From this program, it is clear that usage of mnemonics (in our example LD, ADD, HALT are the mnemonics) has improved the readability of our program significantly.An assembly language program cannot be executed by a machine directly as it is not in a binary form. An assembler is needed in order to translate an assembly language program into the object code executable by the machine. This is illustrated in the figure Writing a program in assembly language is more convenient than in machine language. Instead of binary sequence, as in machine language, it is written in the form of symbolic instructions. Therefore, it gives a little more readability.

Dept. of Computer Science And Applications, SJCET, Palai

P a g e |4

MODULE I

MCA-303 SYSTEM SOFTWARE

ADMN 2011-12

Disadvantages of Assembly Language

Assembly language (program) is specific to particular machine architecture. Assembly languages are designed for specific make and model of a microprocessor. It means that assembly language programs written for one processor will not work on a different processor if it is architecturally different. That is why the assembly language program is not portable. Assembly language program is not as fast as machine language. It has to be first translated into machine (binary) language code. The time and cost of creating machine and assembly languages was quite high.

1.2 System software and Application software


Software is mainly classified into two . They are system software and Application software 1.2.1 System software A system software is any computer software which manages and controls computer hardware so that application software can perform a task. Operating systems, such as Microsoft Windows, Mac OS X or Linux, areprominentexamplesofsystemsoftware.

System software performs tasks like transferring data from memory to disk, or rendering text onto a display device. Specific kinds of system software include loading programs, operating systems, device drivers, programming tools, compilers, assemblers, linkers, and utility software System software is responsible for managing a variety of independent hardware components, so that they can work together harmoniously. Its purpose is to unburden the application software programmer from the often complex details of the particular computer being used, including such accessories as communications devices, printers, device readers, displays and keyboards, and also to partition the computer's resources such as memory and processor time in a safe and stable manner. 1.2.2 Application software Application software consists of programs designed to perform specific tasks for users. Application software can be used as a productivity/business tool; to assist with graphics and multimedia projects; to support home, personal, and educational activities; and to facilitate communications. Specific application software products, called software packages, are available from software vendors. As an example word processing software. There are two main categories of application programs: business programs and scientific application programs. Most programming languages are designed to be good for one category of applications but not necessarily for the other, although there are some general-purpose languages
Dept. of Computer Science And Applications, SJCET, Palai P a g e |5

MODULE I

MCA-303 SYSTEM SOFTWARE

ADMN 2011-12

that support both types. Business applications are characterized by processing of large inputs and large outputs, high volume data storage and retrieval but call for simple calculations. Languages which are suitable for business program development must support high volume input, output and storage but do not need to support complex calculations. On the other hand, programming languages that are designed for writing scientific programs contain very powerful instructions for calculations but rather poor instructions for input, output etc. Amongst traditionally used programming languages, COBOL (Commercial Business Oriented Programming Language) is more suitable for business applications whereas FORTRAN (Formula Translation Language) is more suitable for scientific applications.

Major differences between system software and application software 1) a system software runs the system where an application software runs over the system software. 2) a system software are programs that run & control the hardware units of the system & an application software doesn't. 3) system programs are written using dll, exe files for windows & rpm(redhat package manager) files for linux etc, where application software are developed on the basis these files or by using different language files. 4) you can't create applications using system software but application software are specially made to create applications for users.

1.3. Language Processors


1.3.1 Introduction Language Processing activities arise due to the differences between the manner in which a software designer describes the ideas concerning the behaviour of a software and the manner in which these ideas are implemented in a computer system. The interpreter is a language translator. This leads to many similarities between are Translators and interpreters. From a practical viewpoint many differences also exist between translators and interpreters.

Dept. of Computer Science And Applications, SJCET, Palai

P a g e |6

MODULE I

MCA-303 SYSTEM SOFTWARE

ADMN 2011-12

The absence of a target program implies the absence of an output interface the interpreter. Thus the language processing activities of an interpreter cannot be separated from its program execution activities. Hence we say that an interpreter 'executes' a program written in a PL.

1.3.2 Problem Oriented and Procedure Oriented Languages: The three consequences of the semantic gap mentioned at the start of this section are in fact the consequences of a specification gap. Software systems are poor in quality and require large amounts of time and effort to develop due to difficulties in bridging the specification gap. A classical solution into develop a PL such that the PL domain is very close or identical to the application domain. Such PLs can only be used for specific applications; hence they are called problem-oriented languages. They have large execution gaps, however this is acceptable because the gap is bridged by the translator or interpreter and does not concern the software designer. A procedure-oriented language provides general purpose facilities required in most application domains. Such a language is independent of specific application domains. The fundamental language processing activities can be divided into those that bridge the specification gap and those that bridge the execution gap. We name these activities as 1. Program generation activities 2. Program execution activities. A program generation activity aims at automatic generation of a program. The source languages specification language of an application domain and the target language is typically a procedure oriented PL. A Program execution activity organizes the execution of a program written in a PL on computer system. Its source language could be a procedure-oriented language or a problem oriented language. Program Generation The program generator is a software system which accepts the specification of a program to be generated, and generates program in the target PL. In effect, the program generator introduces a new domain between the application and PL domains we call this the program generator domain. The specification gap is now the gap between the application domain and the program generator domain. This gap is smaller than the gap between the application domain and the target PL domain. Reduction in the specification gap increases the reliability of the generated program. Since the generator domain is close to the application domain, it is easy for the designer or programmer to write the specification of the program to be generated. The harder task of bridging the gap to the PL domain is performed by the generator. This arrangement also reduces the testing effort. Proving the correctness of the program generator amounts to proving the correctness of the transformation . This would be performed while implementing the generator. To test an application generated by using the generator, it is necessary to only verify the correctness of the specification input to the program generator. This is a much simpler task than verifying correctness often generated program.

Dept. of Computer Science And Applications, SJCET, Palai

P a g e |7

MODULE I

MCA-303 SYSTEM SOFTWARE

ADMN 2011-12

This task can be further simplified by providing a good diagnostic (i.e. error indication) capability in the program generator, which would detect inconsistencies in the specification. It is more economical to develop a program generator than to develop a problem-oriented language. This is because a problem oriented language suffers a very large execution gap between the PL domain and the execution domain whereas the program generator has a smaller semantic gap to the target PL domain, which is the domain of a standard procedure oriented language. The execution gap between the target PL domain and the execution domain is bridged by the compiler or interpreter for the PL. Program Execution Two popular models for program execution are translation and interpretation. Program translation The program translation model bridges the execution gap by translating a program written in a PL, called the source program(SP), into an equivalent program in the machine or assembly language of the computer system, called the target program (TP)Characteristics of the program translation model are: A program must be translated before it can be executed. The translated program may be saved in a file. The saved program may be executed repeatedly. A program must be retranslated following modifications. Program interpretation The interpreter reads the source program and stores it in its memory. During interpretation it takes a source statement, determines its meaning and performs actions which implement it. This includes computational and input-output actions. The CPU uses a program counter (PC) to note the address of the next instruction to be executed. This instruction is subjected to the instruction execution cycle consisting of the following steps: 1. Fetch the instruction. 2. Decode the instruction to determine the operation to be performed, and also its operands. 3. Execute the instruction. At the end of the cycle, the instruction address in PC is updated and the cycle is repeated for the next instruction. Program interpretation can proceed in an analogous manner. Thus, the PC can indicate which statement of the source program is to be interpreted next. This statement would be subjected to the interpretation cycle, which could consist of the following steps: 1. Fetch the statement

Dept. of Computer Science And Applications, SJCET, Palai

P a g e |8

MODULE I

MCA-303 SYSTEM SOFTWARE

ADMN 2011-12

2. Analyze the statement and determine its meaning, viz. the computation to be performed and its operands. 3. Execute the meaning of the statement. From this analogy, we can identify the following characteristics of interpretation: The source program is retained in the source form itself, i.e. no target program form exists; A statement is analyzed during its interpretation.

Comparison A fixed cost (the translation overhead) is incurred in the use of the program translation model. If the source program is modified, the translation cost must be incurred again irrespective of the size of the modification. However, execution of the target program is efficient since the target program is in the machine language. Use of the interpretation model does not incur the translation overheads. This is advantageous if a program is modified between executions, as in program testing and debugging. 1.3.3 Language Processing Activities Language Processing = Analysis of SP + Synthesis of TP. Definition motivates a generic model of language processing activities. We refer to the collection of language processor components engaged in analyzing a source program as the analysis phase of the language processor. Components engaged in synthesizing a target program constitute the synthesis phase. A specification of the source language forms the basis of source program analysis. The specification consists of three components: 1. Lexical rules, which govern the formation of valid lexical units in the source language. 2. Syntax rules which govern the formation of valid statements in the source language. 3. Semantic rules which associate meaning with valid statements of the language. The analysis phase uses each component of the source language specification to determine relevant information concerning a statement in the source program. Thus, analysis of a source statement consists of lexical, syntax and semantic analysis. The synthesis phase is concerned with the construction of target language statements which have the same meaning as a source statement. Typically, this consists of two main activities: Creation of data structures in the target program Generation of target code. We refer to these activities as memory allocation and code generation, respectively Lexical Analysis (Scanning)

Dept. of Computer Science And Applications, SJCET, Palai

P a g e |9

MODULE I

MCA-303 SYSTEM SOFTWARE

ADMN 2011-12

Lexical analysis identifies the lexical units in a source statement. It then classifies the units into different lexical classes e.g. ids, constants etc. and enters them into different tables. This classification may be based on the nature of string or on the specification of the source language. (For example, while an integer constant is a string of digits with an optional sign, a reserved id is an id whose name matches one of the reserved names mentioned in the language specification.) Lexical analysis builds a descriptor, called a token, for each lexical unit. A token contain two fields class code, and number in class, class code identifies the class to which a lexical unit belongs, number in class is the entry number of the lexical unit in the relevant table.

Syntax Analysis (Parsing) Syntax analysis processes the string of tokens built by lexical analysis to determine the statement class, e.g. assignment statement, if statement, etc. It then builds an IC which represents the structure of the statement. The IC is passed to semantic analysis to determine the meaning of the statement. Semantic analysis Semantic analysis of declaration statements differs from the semantic analysis of imperative statements. The former results in addition of information to the symbol table, e.g. type, length and dimensionality of variables. The latter identifies the sequence of actions necessary to implement the meaning of a source statement. In both cases the structure of a source statement guides the application of the semantic rules. When semantic analysis determines the meaning of a sub tree in the IC. It adds information a table or adds an action to the sequence. It then modifies the IC to enable further semantic analysis. The analysis ends when the tree has been completely processed.

1.4. Assemblers
1.4.1 ELEMENTS OF ASSEMBLY LANGUAGE PROGRAMMING An assembly language is a machine dependent, low level programming language which is specific to a certain computer system (or a family of computer systems). Compared to the machine language of a computer system, it provides three basic features which simplify programming: Mnemonic operation codes: Use of mnemonic operation codes (also called mnemonic opcodes) for machine instructions eliminates. the need to memorize numeric operation codes. It also enables the assembler to provide helpful diagnostics, for example indication of misspelt operation codes. 2. Symbolic operands: Symbolic names can be associated with data or instructions. These symbolic names can be used as operands in assembly statements. The assembler performs memory bindings to these names; the programmer need not know any details of the memory bindings performed by the assembler. This leads to a very important practical advantage during program modification as discussed in Section 4.1.2. 3. Data declarations: Data can be declared in a variety of notations, including the decimal notation. This avoids manual conversion of constants into their internal machine representation, for example, conversion of 5 into (11111010).
1.
Dept. of Computer Science And Applications, SJCET, Palai P a g e | 10

MODULE I

MCA-303 SYSTEM SOFTWARE

ADMN 2011-12

Statement format An assembly language statement has the following format: [Label]<Opcode><operand spec>[,<operand spec> ..] where the notation [..] indicates that the enclosed specification is optional. If a label is specified in a statement, it is associated as a symbolic name with the memory word(s) generated for the statement.<operand spec> has the following syntax: <symbolic name> [+<displacement>][(<index register>)] Thus, some possible operand forms are: AREA, AREA+5, AREA(4), and AREA+5(4). The first specification refers to the memory word with which the name AREA is asso ciated. The second specification refers to the memory word 5 words away from the word with the name AREA. Here '5' is the displacement or offset from AREA. The third specification implies indexing with index register 4 that is, the operand address is obtained by adding the contents of index register 4 to the address of AREA. The last specification is a combination of the previous two specifications. 1.4.1.1 Assembly Language Statements An assembly program contains three kinds of statements: 1. Imperative statements 2.Declaration statements 3.Assembler directives. Imperative statements An imperative statement indicates an action to be performed during the execution of the assembled program. Each imperative statement typically translates into one machine instruction.

Declaration statements The syntax of declaration statements is as follows: [Label] DS [Label] DC <constant> ' <value>'

The DS (short for declare storage) statement reserves areas of memory and associates names with them. Consider the following DS statements: A G DS DS 1 200

The first statement reserves a memory area of 1 word and associates the name A with it. The second statement reserves a block of 200 memory words. The name G is associated with the first word of the block. Other words in the block can be
Dept. of Computer Science And Applications, SJCET, Palai P a g e | 11

MODULE I

MCA-303 SYSTEM SOFTWARE

ADMN 2011-12

accessed through offsets from G, e.g. G+5 is the sixth word of the memory block, etc. The DC (short for declare constant) statement constructs memory words containing constants. The statement ONE DC ' 1'

associates the name ONE with a memory word containing the value ' 1'. The programmer can declare constants in different forms decimal, binary, hexadecimal, etc. The assembler converts them to the appro priate internal form. Use of constants Contrary to the name 'declare constant', the DC statement does not really implement constants, it merely initializes memory words to given values. These values are not protected by the assembler; they may be changed by moving a new value into the memory word. For example, in Fig. 4.3 the value of ONE can be changed by executing an instruction MOVEM BREG, ONE. An assembly program can use constants in the sense implemented in an HLL in two ways as immediate operands, and as literals. Immediate operands can be used in an assembly statement only if the architecture of the target machine includes the necessary features. In such a machine, the assembly statement ADD AREG,5 is translated into an instruction with two operands AREG and the value '5' as an immediate operand. Note that our simple assembly language does not support this feature, whereas the assembly language of Intel 8086 supports it (see Section 4.5).

ADD AREG, FIVE ADD AREG, ='5.' => FIVE DC '5' --------

(a)
Fig 1. Use of literals in an assembly program

(b)

A literal is an operand with the syntax ='<value>'. It differs from a constant because its location cannot be specified in the assembly progr am. This helps to ensure that its value is not changed during execution of a program. It differs from an immediate operand because no architectural provision is needed to support its use) An assembler handles a literal by mapping its use into other features of the assembly language. Figure 4.4(a) shows use of a literal ='5'. Figure 1(b) shows an equivalent arrangement using a DC statement FIVE DC ' 5 1 . When the assembler encounters the use of a literal in the operand field of a statement, it handles the literal using an arrangement similar to that shown in
Dept. of Computer Science And Applications, SJCET, Palai P a g e | 12

MODULE I

MCA-303 SYSTEM SOFTWARE

ADMN 2011-12

Fig. 1(b) it allocates a memory word to contain the value of the literal, and replaces the use of the literal in a statement by an operand expression referring to this word. The value of the literal is protected by the fact that the name and address of this word is not known to the assembly language programmer.

Assembler directives Assembler directives instruct the assembler to perform certain actions during the assembly of a program. Some assembler directives are described in the following. START <constant> This directive indicates that the first word of the target program generated by the assembler should be placed in the memory word with address <constant>. END [<operand spec>]

This directive indicates the end of the source program. The optional <opcraml ,spec> indicates the address of the instruction where the execution of the program should begin. (By default, execution begins with the first instruction of the assembled program.) 1.4.1.2 Advantages of Assembly Language The primary advantages of assembly language programming vis-a-vis machine language programming arise from the use of symbolic operand specifications. Figure 2 shows a changed program to compute N!/2, where rectangular boxes are used to highlight changes in the program. One statement has been inserted before the PRINT statement to implement division by 2. In the machine language program, this leads to changes in addresses of constants and reserved memory areas. Because of this, addresses used in most instructions of the program had to change. Such changes are not needed in the assembly program since operand specifications are symbolic in nature.

Dept. of Computer Science And Applications, SJCET, Palai

P a g e | 13

MCA-303 SYSTEM SOFTWARE

ADMN 2011-14

START READ

101 N 101) + 09 0 114

MOVER

BREG, ONE

102)

+ 04 2

116

MOVEM AGAIN MULT MOVER ADD MOVEM CCJMP BC DIV MOVEM PRINT STOP N RESULT NE TERM TWO DS DS DC DS DC END

BREG, TERM 103) BREG, TERM 104) CREG, TERM 105) CREG, ONE CREG, TERM 107) CREG, N LE, AGAIN BREG, TWO BREG, RESULT 111) RESULT 112) 113) 1 1 '1' 1 '2' 114) 115) 116) 117) 118) 108) 109) 110) 106)

+ + + + + + + + + + +

05 2 03 2 04 3 01 3 05 3 06 3 07 2 08 2 05 2 [115 10 0 [TT5 00 0

117 117 117 116 117 114 104 118

000

00 0

001

00 0

001

Fig. 2

Design specification of an assembler We use a four step approach to develop a design specification for an assembler: 1. Identify the information necessary to perform a task. Design a suitable data structure to record the information. 3. Determine the processing necessary to obtain and maintain theinformation. 4. Determine the processing necessary to perform the task.
2.

The fundamental information requirements arise in the synthesis phase of an assembler. Hence it is best to begin by considering the information requirements of the synthesis tasks. We then consider how to make this information available, i.e. whether it should be collected during analysis or derived during synthesis. Synthesis phase Consider the assembly statement

Dept. of Computer Science And Applications, SJCET, Palai

P a g e | 14

MCA-303 SYSTEM SOFTWARE

ADMN 2011-14

MOVER BREG, ONE / in Fig. 4.3. We must have the following information to synthesize the machine instruction corresponding to this statement:
1. 2.

Address of the memory word with which name ONE is associated, Machine operation code corresponding to the mnemonic MOVER.

The first item of information depends on the source program. Hence it must be made available by the analysis phase. The second item of information does not depend on the source program, it merely depends on the assembly language. Hence the synthesis phase can determine this information for itself. Based on the above discussion, we consider the use of two data structures during the synthesis phase:
1. 2.

Symbol table Mnemonics table.

Each entry of the symbol table has two primary fields name and address. The table is built by the analysis phase. An entry in the mnemonics table has two primary fields mnemonic and opcode. The synthesis phase uses these tables to obtain the machine address with which a name is associated, and the machine opcode corre sponding to a mnemonic, respectively. Hence the tables have to be searched with the symbol name and the mnemonic as keys. Analysis phase The primary function performed by the analysis phase is the building of the symbol table. For this purpose it must determine the addresses with which the symbolic names used in a program are associated. It is possible to determine some addresses directly7)e.g. the address of the first instruction in the program, however others must be inferred. Consider the assembly program of Fig. 4.3. To determine the address of N, we must fix the addresses of all program elements preceding it. This function is called memory allocation. To implement memory allocation a data structure called location counter (LC) is introduced. The location counter is always made to contain the address of the next memory word in the target program.It is initialized to the constant specified in the START statement. Whenever the analysis phase sees a label in an assembly statement, it enters the label and the contents of LC in a new entry of the symbol table. It then finds the number of memory words required by the assembly statement and updates the LC contents. (Hence the word 'counter' in "location counter'.) This ensures that LC points to the next memory word in the target program even when machine instructions have different lengths and DS/DC statements reserve different amounts of memory. To update the contents of LC, analysis phase needs to know lengths of different instructions. This information simply depends on the assembly language, hence the mnemonics table can be extended to include this information in a new field called length. We refer to the processing involved in maintaining the location counter as LC processing

Dept. of Computer Science And Applications, SJCET, Palai

P a g e | 15

MCA-303 SYSTEM SOFTWARE

ADMN 2011-14

mnemonic opcode length

The tasks performed by the analysis and synthesis phase are as follows:

Analysis phase 1. Isolate the label, mnemonic opcode and operand fields of a statement. If a label is present, enter the pair ( symbol, <LC contents>) in a new entry of symbol table.
2.

3. Check validity of the mnemonic opcode through a lookup in the Mnemonics table. 4. Perform LC processing, i.e. update the value contained in LC by considering the opcode and operands of the statement.

Synthesis phase 1. Obtain the machine opcode corresponding to the mnemonic from the Mnemonics table. 2.Obtain address of a memory operand from the Symbol table. 3.Synthesize a machine instruction or the machine form of a constant, as the case may be.

Dept. of Computer Science And Applications, SJCET, Palai

P a g e | 16

MCA-303 SYSTEM SOFTWARE

ADMN 2011-14

1.4.2

PASS STRUCTURE OF ASSEMBLERS We have defined a pass of a language processor as one complete scan of the source program, or its equivalent representation .We discuss two pass and single pass assembly schemes in this section.

Two pass translation Two pass translation of an assembly language program can handle forward references easily\LC processing is performed in the first pass and symbols defined in the program are entered into the symbol table. The second pass synthesizes the target form using the address information found in the symbol table. In effect, the first pass performs analysis of the source program while the second pass performs synthesis of the target program. The first pass constructs an intermediate representation (IR) of the source program for use by the second pass (see Fig. 4.7). This representation consists of two main components data structures, e.g. the symbol table, and a pro cessed form of the source program. The latter component is called intermediate code (IC)

1.4.2.1 Single pass translation LC processing and construction of the symbol table proceed as in two pass transla tion. The problem of forward references is tackled using a process called backpatching. The operand field of an instruction containing a forward reference is left blank initially. The address of the forward referenced symbol is put into this field when its definition is encountered. The instruction corresponding to the statement
MOVER BREG, ONE

can be only partially synthesized since ONE is a forward reference. Hence the in struction opcode and address of BREG will be assembled to reside in location 101. The need for inserting the second operand's address at a later stage can be indicated by adding an entry to the Table of Incomplete Instructions (TII). This entry is a pair (<instruction address>. <symbol>). e.g. (101. ONE) in this case.

Dept. of Computer Science And Applications, SJCET, Palai

P a g e | 17

MCA-303 SYSTEM SOFTWARE

ADMN 2011-14

By the time the END statement is processed, the symbol table would contain the addresses of all symbols defined in the source program and TII would contain in formation describing all forward references. The assembler can now process each entrv in TII to complete the concerned instruction. For example, the entry (101. ONE) would be processed by obtaining the address of ONE from symbol table and inserting it in the operand address field of the instruction with assembled address 101. Alternatively. entries in TII can be processed in an incremental manner. Thus, when definition of some symbol symbol is encountered, all forward references to symbol can be processed. 1.4.2.2 DESIGN OF A TWO PASS ASSEMBLER Tasks performed by the passes of a two pass assembler are as follows: Pass I 1. Separate the symbol, mnemonic opcode and operand fields. 2.Build the symbol table. 3. Perform LC processing. 4.Construct intermediate representation. Pass II Synthesize the target program . Pass I performs analysis of the source program and synthesis of the intermediate representation while Pass II processes the intermediate representation to synthesize the target program. Pass I of the Assembler Pass I comprises the following data structures: OPTAB A table of mnemonic opcodes and related infor mation SYMTAB Symbol table LITTAB A table of literals used in the program Figure 4.9 illustrates sample contents of these tables while processing the program of Fig. 4.8. OPTAB contains the fields mnemonic opcode, class and mnemonic info. The class field indicates whether the opcode corresponds to an imperative statement (IS), a declaration statement (DL) or an assembler directive (AD). If an imperative, the mnemonic info field contains the pair (machine opcode, instruction length). else it contains the id of a routine to handle the declaration or directive statement. A SYMTAB entry contains the fields address and length. A LITTAB entry contains the lields literal and address.

Dept. of Computer Science And Applications, SJCET, Palai

P a g e | 18

MCA-303 SYSTEM SOFTWARE

ADMN 2011-14

Processing of an assembly statement begins with the processing of its label field. If it contains a symbol, the symbol and the value in LC is copied into a new entry of SYMTAB. Thereafter, the functioning of Pass I centers around the interpretation of the OPTAB entry for the mnemonic. The class field of the entry is examined to determine whether the mnemonic belongs to the class of imperative, declaration or assembler directive statements. In the case of an imperative statement, the length of the machine instruction is simply added to the LC. The length is also entered in the SYMTAB entry of the symbol (if any) defined in the statement. This completes the processing of the statement. The use of L1TTAB needs some explanation. The first pass uses L1TTAB to co llect all literals used in a program. Awareness of different literal pools is maintained using the auxiliary table POOLTAB. This table contains the literal number of the starting literal of each literal pool. At any stage, the current literal pool is the last pool in L1TTAB. On encountering an LTORG statement (or the END statement), literals in the current pool are allocated addresses starting with the current value in LC and LC is appropriately incremented.

Dept. of Computer Science And Applications, SJCET, Palai

P a g e | 19

MCA-303 SYSTEM SOFTWARE

ADMN 2011-14

Dept. of Computer Science And Applications, SJCET, Palai

P a g e | 20

MCA-303 SYSTEM SOFTWARE

ADMN 2011-14

Dept. of Computer Science And Applications, SJCET, Palai

P a g e | 21

MCA-303 SYSTEM SOFTWARE

ADMN 2011-14

Dept. of Computer Science And Applications, SJCET, Palai

P a g e | 22

MCA-303 SYSTEM SOFTWARE

ADMN 2011-14

Dept. of Computer Science And Applications, SJCET, Palai

P a g e | 23

MCA-303 SYSTEM SOFTWARE

ADMN 2011-14

Dept. of Computer Science And Applications, SJCET, Palai

P a g e | 24