Sunteți pe pagina 1din 67

System Software

Module 1
General concepts - Review of assembly and machine language
programming , distinction between system software and application
software, Language processors :- Introduction , Language
processing activities.

Assemblers:- Elements of Assembly language programming, A


simple assembly scheme, Pass structure of assemblers, Design of
two pass assemblers.
Language Processors

Introduction

Language processing activities arise due to the differences b/w the manner in which a s/w
designer describes the ideas concerning the behaviour of a s/w and the manner in which these
ideas are implemented in a computer s/m.

Application domain: The designer expresses the ideas in terms related to the application domain
of the s/w.
Execution domain: To implement the ideas, their description has to be interpreted in terms
related to the execution domain of the computer s/m.
PL domain: Software implementation using a Programming Language introduces a new domain
called PL domain.
Semantic gap: Represent the difference b/w the semantics of two domains.(Semantic represent
the rules of meaning of a domain.
Semantic gap

Application Execution
domain domain
Continued…..

Specification gap(Specification-and-design gap): It is the semantic gap b/w two specifications of


the same task(The gap b/w the application and PL domain).

Execution gap: It is the gap b/w the semantics of programs (that perform the same task)written in
different programming languages.(The gap b/w the PL and execution domain).

Specification gap Execution gap

Application PL Execution
domain domain domain

 The specification gap is bridged by the software development team.


 The execution gap is bridged by the designer of the programming language
processor(translator or interpreter )
 We assume that each domain has a
specification language(SL).
 A specification written in an SL is a
program in SL.
 The specification language of the PL
domain is the PL itself.
 The specification language of the execution
domain is the machine language.
Continued….

Definition of Language Processor

A language processor is a software which bridges a


specification or execution gap.

Source Program and Target Program

The input to a language processor is called source program


and output is called target program.

The languages in which these programs are written are called


source language and target language respectively.
 A language translator bridges an execution gap
to the machine language of a computer system.
 A detranslator bridges the same execution gap ,
but in the reverse direction.
 A preprocessor is a language processor which
bridges an execution gap but is not a language
translator.
 A language migrator bridges the specification
gap between two PLs.
Language processing activities

The language processing activities can be divided into those that bridge the
specification gap and those that bridge the execution gap. The activities are

1. Program generation activities


2. Program execution activities

 A program generation activity aims at automatic generation of a program.

 A program execution activity organizes the execution of a program written in a PL on


a computer system.

Program Generation

The following figure shows the program generation activity

Errors

Program
Program specification generator Program in target PL
Continued…..

The program generator is a s/w system which accepts the specification of a


program to be generated, and generates a program in the target PL.

The program generator introduces a new domain called program generator


domain.

. Specification gap

Application Program Execution


Target PL domain
domain generator domain domain
Program Execution
Two popular models for program are translation and interpretation.

Program translation
The program translation model bridges the execution gap by translating a
program written in a PL(Source program), into an equivalent program in the
machine or assembly language of the computer system(target program).

Errors

Translator m/c language


Source Program Target Pgm
pgm
Characteristics of the program translation model

A program must be translated before it can be


executed.

 The translated program may be saved in a file.


The saved program may be execute repeatedly.

 A program must be retranslated following


modifications
Continued…..

Program interpretation

The interpreter reads the source program and stores it in its


memory.

During interpretation it takes a source statement, determines its


meaning and perform actions which implement it.

Characteristics of interpretation

The source program is retained in the source form itself, i.e. no


target program form exists

A statement is analyzed during its interpretation.


Fundamentals of Language Processing

Language Processing Ξ Analysis of SP + Synthesis of TP.

 The collection of language processor components engaged in analyzing a source


program is the analysis phase of the language processor.

 Components engaged in synthesizing a target program constitute the synthesis


phase.

 A specification of the source language forms the basis of source program


analysis. The
specification consists of three components:
1. Lexical rules which govern the formation of valid lexical units in the source
language.
2. Syntax rules which govern the formation of valid statements in the source
language.
3. Semantic rules which associate meaning with valid statements of the language.
The analysis phase uses each component of the source
language specification to determine relevent information
concerning a statement in the source program.

Thus, analysis of source statement consists of lexical, syntax


and semantic analysis.

The synthesis phase is concerned with the construction of


target language statements which have the same meaning as a
source statement.

This consists of two main activities:


 Creation of data structures in the target program (memory
allocation)

 Generation of target code (code generation).


 Eg. Consider the statement
percent_profit := (profit * 100) / cost_price;
Lexical analysis identifies :=,* and / as opeartors, 100
as a constant and the remaining strings as identifiers.
Syntax analysis identifies the statement as an
assignment statement with percent_profit as the LHS
and (profit *100)/ cost_price as the expression on the
RHS.
Semantic analysis determines the meaning of the
statement to be the assignment of
(profit*100) / cost_price
to the percent_profit.
Phases of a language processor
Schematic of a language processor

Language processor

Source Program Analysis Synthesis Target Program


phase phase

Errors Errors
ASSEMBLERS
ASSEMBLERS
The assembly-language program must be translated into
machine code by a separate program called an assembler.

The input to an assembler program is called the source


program.

The output is a machine language translation called object


program.

Assembly language Machine language


Assembler
program program
Elements of Assembly Language Programming

An assembly language is a machine dependent, low level


programming language.

Compared to the machine language, it provides 3 basic features.

1. Mnemonic operation codes: Use of mnemonic opcodes for machine


instructions eliminates the need to memorize numeric operation codes.

2. Symbolic operands: Symbolic names can be associated with data or


instructions. These symbolic names can be used as operands in assembly
statements. The assembler performs memory bindings to these names;
the programmer need not know any details of the memory bindings
performed by the assembler.

3. Data declarations: Data can be declared in a variety of notations,


including the decimal notations. This avoids manual conversion of
constants into their internal machine representation.
Statement format

[Label] <Opcode> <operand spec>,[operand spec>…]

[…] indicates the enclosed specification is optional.


Label is associated as a symbolic name with the memory
word(s) generated for the statement.
[operand spec] has the following syntax:
<symbolic name>[+<displacement>][(<index register>)]
Eg. AREA :- refers to the memory worsd with which the name AREA is associated.
AREA +5 :- refers to the memory word 5 words away from the word with
the name AREA.
AREA(4) :- implies indexing with index register 4. ie, operand address is
obtained by adding the contents of index register 4 to the address of AREA.
AREA +5(4) :- combination of previous two specifications.
A simple assembly language

In this language, each statement has 2 operands.

The first operand is always a register which can be


any one of AREG,BREG,CREG and DREG.

The second operand refers to a memory word using a


symbolic name and an optional displacement.
The following figure lists the mnemonic opcodes for machine instructions

Instruction opcode Assembly mnemonic


00 STOP
01 ADD
02 SUB
03 MULT
04 MOVER
05 MOVEM
06 COMP
07 BC (branch on condition)
08 DIV
09 READ
10 PRINT
Instruction Format

sign opcode reg memory operand


operand

 The opcode, register operand and memory operand occupy 2, 1 and 3


digits respectively.

 Sign is not a part of the instruction.

Egs,
START 101
MULT BREG,TERM
ONE DC ‘1’
Assembly language Statements

There are 3 types of statements.

1. Imperative statements 2. Declaration statements 3.Assembler


directives

I. Imperative statements

An imperative statement indicates an action to be performed


during the execution of the assembled program.

Each imperative statement translates into one machine


instruction.(Eg: add,sub ...)
ADD is translated to 01 and SUB is translated to 02 etc.
II. Declaration statements

To allocate and initialize memory for variables

Syntax:
[Label] DS <constant>
[Label] DC ‘<value>’

DS(declare storage) statement reserves areas of memory and associates names


with them.
DC(declare constant) statement constructs memory words containing constants
Eg.

A DS 1 :- reserves memory area of 1 word and associates the name A with it.
G DS 200 :- reserves a block of 200 memory words. Name G is associated
with the first block. Other blocks in the block can be accessed by adding the
offset. E.g. G+5 .

ONE DC ‘1’ :- associates the name ONE with a memory word containing
the value ‘1’.
Use of constants

An assembly pgm can use constants in 2 ways.


1. Immediate operands 2. Literals.
Immediate operands

It can be used in an assembly statement only if the architecture


of the target machine includes the necessary features.
In such a machine, the assembly statement
ADD AREG,5
is translated into an instruction with 2 operands-AREG and the
value ‘5’ as an immediate operand. ( assembly language of 8086
supports it).
Literals
A literal is an operand with the syntax =‘<value>’.
It differs from a constant because its location cannot be
specified in the assembly program.
Its value is not changed during the execution of the
program.
No architectural provision is needed to support its use.

ADD AREG, =‘5’ => ADD AREG, FIVE


-----
FIVE DC ‘5’
III. Assembler directives

Assembler directives instruct the assembler to perform certain


actions during the assembly of a program.

START <constant>

Indicates that the first word of the target program generated by the
assembler should be placed in the memory word with address
<constant>.

END [<operand spec>]

Indicates the end of the source program. The optional <operand spec>
indicates the address of the instruction where the execution of the
program should begin.(By default, execution begins with the first
instruction of the assembled program).
Assembler’s functions

Convert mnemonic operation codes to their machine language


equivalents

Convert symbolic operands to their equivalent machine


addresses

Build the machine instructions in the proper format

Convert the data constants to internal machine representations


A Simple Assembly Scheme

Process of Translation

The process of translation of an assembly language pgm to m/c


language can be expressed as

Analysis of Synthesis of Translation from


Source Text Target Text Source to Target
Text
In the analysis phase, determine the meaning of a source
language text.
To determine meaning, we must know the rules according
to which the source language statements are constructed.i.e,
we must know the grammar of the source language.

Also know how to determine the meaning of a statement


once its grammatical structure is known.
The rules of grammar is syntax and rules of meaning is
semantics of the language.
Eg: AGAIN MOVER RESULT+4

The rules of writing(syntax) an assembly language statement tells

i. AGAIN appears in the label field


ii. MOVER in the mnemonic opcode field
iii. RESULT+4 in the operand field.

Rules of meaning(semantics) tells

i. AGAIN is a name given to this statement as a whole


ii. MOVER is its opcode mnemonic
iii. RESULT+4 is its operand.
In the synthesis phase select the appropriate machine
operation code for the mnemonic MOVER and place
it in the machine instruction’s opcode field.

Then evaluate the address corresponding to


RESULT+4 and place it in address field of the
machine instruction.
Design specification of an assembler
The four step approach for the design specification of an assembler
1. Identify the information necessary to perform the task
2. Design a suitable data structure to record the information
3. Determine the processing to obtain and maintain the information
4. Determine the processing necessary to perform the task

Phases of Assembler

Analysis phase Synthesis phase


Analysis phase

o The primary function is building of symbol table.

o To build the symbol table, it must determine the address with


which the symbolic names used in a program are associated.

oTo determine the address of N, we must fix the address of all


program elements preceding it. This function is memory
allocation.
oTo implement memory allocation a data structure called
Location Counter(LC) is introduced.
oLC contain the address of the next memory word in the
target program.
oLC is initialized to the constant specified in the START
statement.
oIf a label is present in the assembly statement , make a new
entry in symbol table with the pair (symbol, <LC contents>)
oThen finds the number of memory words required by the
assembly statement and updates the LC content.

oPerform LC (Location Counter) processing


Synthesis phase

1. Obtain the machine opcode corresponding to the mnemonic from the


mnemonic table

2. Obtain the address of memory operand from symbol table

3. Synthesize a machine instruction


Eg: Consider the statement
MOVER BREG, ONE
requires
a) address of memory variable ‘ONE’
b) m/c opcode corresponding to the mnemonic MOVER

Analysis phase Synthesis phase

Symbol table Symbol table


makes refers

Mnemonic table refers


Mnemonic table
refers

Source code i/p


Intermediate Repn
i/p

Data structure (S T) o/p


Object code
o/p

Processed form of source


program
Mnemonic Opcode Length
ADD 01 1
SUB 02 1
Mnemonic Table

Analysis Synthesis
Source phase phase Object
pgm pgm

Symbol Address
AGAIN 104
N 113 Data access
Control transfer
Symbol Table
Fig: Data structures of the assembler
PASS STRUCTURES OF ASSEMBLERS

Two Pass Translation


Pass 1 Analysis phase

Pass 2 Synthesis phase

The first pass constructs an intermediate representation (IR) of the source


program for use by the second pass.

This representation consists of two main components

(1) Data structures (eg. Symbol table)

(2) Processed form of the source program (Intermediate code – IC )


START 101
READ N 101) + 09 0 114
MOVER BREG, ONE 102) + 04 2 116
MOVEM BREG, TERM 103) + 05 2 117
AGAIN MULT BREG, TERM 104) + 03 2 117
MOVER CREG, TERM 105) + 04 3 117
ADD CREG, ONE 106) + 01 3 116
MOVEM CREG, TERM 107) + 05 3 117
COMP CREG, N 108) + 06 3 114
BC LE, AGAIN 109) + 07 2 104
DIV BREG, TWO 110) + 08 2 118
MOVEM BREG, RESULT 111) + 05 2 115
PRINT RESULT 112) + 10 0 115
STOP 113) + 00 0 000
N DS 1 114)
RESULT DS 1 115)
ONE DC ‘1’ 116) + 00 0 001
TERM DS 1 117)
TWO DC ‘2’ 118) + 00 0 002
END

An assembly language program and it’s equivalent machine language program


Single Pass Translation
LC processing and construction of the symbol table proceed as in two pass
translation .

Problem : Forward Reference Problem


Solution : Back patching

The operand field of an instruction containing a forward reference is left


blank initially. The address of the forward reference symbol is put into this
field when its definition is encountered.
For eg, in the statement
MOVER BREG, ONE
Can be only partially synthesized since ONE is a forward reference.
Hence the instruction opcode and address of BREG will be assembled
to reside in location 101. The need for inserting the second operand’s
address at a later stage can be indicated by adding an entry to the
Table of Incomplete Instructions(TII).

The TII entry is a pair (<instruction address>,<symbol>)


Eg (101,ONE).

By the time the END statement is processed, the symbol table would
contain the address of all symbols defined in the source program and
TII would contain information describing all forward references.

The assembler can now process each entry in TII to complete the
concerned instruction.
For eg, the entry (101, ONE) would be processed by obtaining the
address of ONE from symbol table and inserting it in the operand
address field of the instruction with assembled address 101.
Backpatching

TII (Table of Incomplete Instructions)


uses

fields (<instn addr>, <symbol>)


Completion done at the ‘end’, by
when
referring TII for incomplete
instructions,
ST for the address
Data structures

Pass 1 Pass 2
Source Target
pgm pgm

Intermediate code
Data access
Control transfer

Overview of Two Pass Assembly


Design of a Two Pass Assembler
Tasks performed by a two pass assembler are

Pass I 1. Separate the symbol, mnemonic opcode and operand fields


2. Build the symbol table`
3. Perform LC processing
4. Construct intermediate representation

Pass II Synthesize the target program

Pass I of the Assembler

Pass I uses the following data structures

OPTAB A table of mnemonic opcodes and related information


SYMTAB Symbol table
LITTAB A table of literals used in the program
1 START 200
2 MOVER AREG, =‘5’ 200) + 04 1 211
3 MOVEM AREG, A 201) + 05 1 217
4 LOOP MOVER AREG, A 202) + 04 1 217
5 MOVER CREG, B 203) + 05 3 218
6 ADD CREG, =‘1’ 204) + 01 3 212
7 ……

12 BC ANY, NEXT 210) + 07 6 214


13 LTORG
=‘5’ 211) + 00 0 005
=‘1’ 212) + 00 0 001
14 ….
15 NEXT SUB AREG, =‘1’ 214) + 02 1 219
16 BC LT, BACK 215) + 07 1 202
17 LAST STOP 216) +00 0 000
18 ORIGIN LOOP+2
19 MULT CREG, B 204) + 03 3 218
20 ORIGIN LAST+1
21 A DS 1 217)
22 BACK EQU LOOP
23 B DS 1 218)
24 END
25 =‘1’ 219) +00 0 001
 Processing of an assembly stmnt begins with processing of its
label field.
 After label ,processing of every source stmnt, the mnemonic is
isolated and searched.
 If the second field is present, its entry is examined to
determine whether the mnemonic belongs to the class of IS,DL
and AD.
 If an imperative, the mnemonic info field contains the
pair(machine opcode, instrn length).
 For both assembler directives and declarative stmnts, the
‘Routine id’ field contains the identifier of a routine which
would perform appropriate processing for the stmnt.
 The first pass uses LITTAB to collect all literals used in a pgm.
 Awareness of different literal pools is maintained using the
auxiliary table POOLTAB. This table contains the literal no. of
the starting literal of each literal pool.
 At every LTORG stmnt(or the END stmnt),assembler allocates
memory to the literals of a literal pool.
Mnemonic Mnemonic symbol address length
opcode class info LOOP 202 1
MOVER IS (04,1)
NEXT 214 1
DS DL R#7
LAST 216 1
START AD 1
A 217 1
.
.
BACK 202 1
OPTAB
B 218 1

SYMTAB
literal address literal

1 =‘5’ #1
2 =‘1’ #3
3 =‘1’ -
LITTAB POOLTAB

Fig: Data structures of assembler Pass I


OPTAB contains the fields mnemonic opcode,class and mnemonic info. The
class field indicates whether the opcode corresponds to an imperative
statement(IS),Declarative statement(DL) or an Assembler Directive(AD)
statement. If an IS, the mnemonic info field contains the pair(machine opcode,
instruction length ), else it contains the id of a routine to handle the declaration
or directive statement.

SYMTAB entry contains the fields symbol, address and length. If the assembly
statement contains a symbol, the symbol and the value of LC is copied into a
new entry of SYMTAB.

LITTAB entry contains the fields literal and address.

POOLTAB :- Awareness of different literal pools is maintained using this. This


table contains the literal number of the starting literal of each literal pool.
Algorithm- Assembler First Pass

1. Loc_cntr := 0; (default vale)


Pooltab-ptr := 1;
POOLTAB[1] :=1;
Littab_ptr := 1;
2. While next statement is not an END statement

(a) If label is present then


this_label:= symbol in label field;
Enter (this_label, loc_cntr) in SYMTAB

(b) If an LTORG statement then


(i) Process literals LITTAB[POOLTAB[pooltab_ptr] ] …
LITTTAB [littab_ptr-1] to allocate memory and put the
address in the address field. Update loc_cntr accordingly.
(ii) pooltab_ptr := pooltab_ptr + 1;
(iii) POOLTAB[pooltab_ptr]:= littab_ptr;
(c ) If a START or ORIGIN statement then
Loc_cntr := value specified in operand field;

(d) If an EQU statement then


(i) this_addr := value of <address spec>
(ii) Correct the symtab entry for this_label to
(this_label, this_addr).

(e) If a declaration statement then


• Invoke the routine whose id is mentioned in the
mnemonic info field. This routine returns code
and size.
• Code := code of the declaration statement;
• Size := size of memory area required by DC/ DS
(iii) Loc_cntr := loc_cntr+size;
(iv) Generate IC ‘(DL,code)….’.

(f) If an imperative statement then


• Code := machine opcode from OPTAB;
• Loc_cntr := loc_cntr + instruction length from
OPTAB;
(iii) If operand is a literal then
this_literal := literal in operand field;
LITTAB[littab_ptr] := this_literal;
littab_ptr := littab_ptr +1;
Else (i.e. operand is a symbol)
this_entry:= SYMTAB entry number of
operand;
Generate IC ‘(IS, code)(S, this_entry)’;

3. (Processing of END statement)


(a) Perform step 2(b)
(b) Generate IC ‘(AD,02)’
(c ) Go to Pass II
Mnemonic Field

The mnemonic field contains a pair of the form


(statement class , code)
Where statement class can be one of IS,DL and AD. For an IS, code is the
instruction opcode in the machine language.
Thus, (AD,01) stands for assembler directive number 1 which is the directive
START.

Declaration Assembler
Statements Directives

DC 01 START 01
DS 02 END 02
ORIGIN 03
EQU 04
LTORG 05
IC for Imperative Statements

The first operand is represented by a single digit number which is a code for
register (1-4) or the condition code itself(1-6 for LT-ANY).
The second operand is a memory operand, is represented by a pair of the form

(operand class,code)

where operand class is one of C(constant),S(symbol), and L(Literal).

For a constant, the code field contains the internal representation of the
constant itself.

Eg, the operand descriptor for the statement

START 200 is (C,200)

For a symbol or literal, the code field contains the ordinal number of the
operand’s entry in SYMTAB or LITTAB.
LTORG

When an LTORG statement appears in the source program, it assigns memory


addresses to the literals in the current pool. These addresses are entered in the
address field of their LITTAB entries.

EQU

The EQU assembler directive simply equates a symbolic name to a numeric value.

Eg. Sunday EQU 1


Algorithm- Assembler Second Pass

1. code_area_address := address of code area;


Pooltab_ptr := 1;
Loc_cntr := 0;

2. While next statement is not an END statement


(a) Clear machine_code_buffer;
(b) If an LTORG statement
(i) Process literals in LITTAB[POOLTAB [pooltab_ptr] ]
… LITTAB[POOLTAB[pooltab_ptr+1] -1] similar to
processing of constants in a DC statement, i.e. assemble
the literals in machine_code _buffer.
(ii) Size := size of memory area required for literals;
(iii) pooltab_ptr := pooltab_ptr +1;

(c) If a START or ORIGIN statement then


(i) loc-cntr:= value specified in operand field
(ii) Size := 0;
(d) If a declaration statement
(i) If a DC statement then
assemble the constant in machine_code_buffer
(ii) Size := size of instruction

(e) If an imperative statement then


(i) Get operand address from SYMTAB or LITTAB
(ii) Assemble instruction in machine_code_buffer
(iii) size := size of instruction

(f) If size ≠ 0 then


(i) Move contents of machine_code_buffer to the address
code_area_address+loc_cntr;
(ii) loc_cntr := loc_cntr + size;

3. (Processing of END statement)


(a) Perform steps 2(b) and 2(f)
(b) write code_area into output file
Questions
Apr 2012
1. Explain the advantages of assembly language over
machine language.
2. Explain the role of Lexical analyzer.
3. Explain in brief about assembler directives.
4. Explain literal pools.
5. Explain system software and application software.
Discuss the importance of system software.
6. Discuss various assembly language statements.
7. Discuss the advantages of two pass assembler over
single pass assembler.
Apr 2010

1. Discuss about the assembler functions in detail.


2. Explain the design issues of an assembler.
MAY 2010
1. What are the problems of single pass assembly.
2. What are literal pools? Discuss
3. Explain processing of literals in two passes of
assembler.
4. Explain pass II algorithm of the assembler.
May 2009
1. List the basic features of assembly language
programming which makes it a lot easier than a
machine language.
2. In a machine using the base displacement addressing
structure, what is the problem in permitting backward
references to literals situated in in previous literal
pools.
3. Explain the design of two pass assembler.
MAY 2008
1. Discuss the major issues in Lexical analysis.
END